Blocking crawlers in Nginx
Today, while waiting for my iPhone SE to update to iOS 18, I tailed the access log of Plurrrr. I noticed that I got many requests by crawlers announcing themselves as Bytespider and as SemrushBot. I decided to return a HTTP status code of 403 Forbidden to those bots.
To achieve this I added the following to a server
section :
if ($http_user_agent
~* "SemrushBot|Bytespider") {
return 403;
}
The above code has been formatted to fit the width of this blog.
The if statement tries to match in a case insensitive way one of
either SemrushBot
or Bytespider
. If there is a match a 403
Forbidden is returned to the bot.
Note that you can add as many bots as you like by adding another pipe symbol followed by the name of the bot as stated in its user agent string.
After I had added the above code I verified the configuration files
using nginx -t
. When no error was reported I issued systemctl reload nginx
to reload the configuration files into Nginx.
Hopefully, this will reduce the amount of traffic I am getting from those two bots.
Finally, another option is to slow down those bots by using the
limit_rate
directive. This limits the rate of response transmission
to a client.