Plurrrr

Wed 18 Sep 2024

Blocking crawlers in Nginx

Today, while waiting for my iPhone SE to update to iOS 18, I tailed the access log of Plurrrr. I noticed that I got many requests by crawlers announcing themselves as Bytespider and as SemrushBot. I decided to return a HTTP status code of 403 Forbidden to those bots.

To achieve this I added the following to a server section :

if ($http_user_agent
    ~* "SemrushBot|Bytespider") {
        return 403;
}

The above code has been formatted to fit the width of this blog.

The if statement tries to match in a case insensitive way one of either SemrushBot or Bytespider. If there is a match a 403 Forbidden is returned to the bot.

Note that you can add as many bots as you like by adding another pipe symbol followed by the name of the bot as stated in its user agent string.

After I had added the above code I verified the configuration files using nginx -t. When no error was reported I issued systemctl reload nginx to reload the configuration files into Nginx.

Hopefully, this will reduce the amount of traffic I am getting from those two bots.

Finally, another option is to slow down those bots by using the limit_rate directive. This limits the rate of response transmission to a client.