Cloud Platform

Blocking unwanted traffic on your website

As a website owner or developer, you must ensure security and performance of your website. One effective way to achieve this is to control access to your website by using the .htaccess file.

The following are the common methods to block unwanted traffic on your website:

Each of these methods focuses on blocking different entity types from accessing your sites. In the recommended code snippets in these methods, ensure that the regular expressions and Apache directives are properly formatted and adapted for your specific use case.

Blocking specific user agents

If your website encounters a DDoS attack and you want to block a group of IP addresses using the same user agent, then use the following code after replacing UserAgent with the name of the agent that you want to block:

RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]

To avoid website errors, you must properly use escape characters in your regular expressions (regex). HTTP_USER_AGENT can use regex as an argument. In the preceding code, many user agents require regex because of the complexity of their name. Instead of creating a rule manually, you can use websites such as https://www.regex-escape.com/regex-escaping-online.php to quickly construct a properly-escaped regex.

Testing the blocking of user agents

  1. Run the following command to test that the site is responding:

    curl -H "host:www.url_you_are_testing.url http://localhost/

  2. Run the following command to test that the user agent is blocked:

    curl -H "host:www.url_you_are_testing.url" -H "user-agent:Pcore-HTTP/v0.25.0" http://localhost/

    This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.

Blocking traffic from robot crawlers

A robot crawler can cause problems to a site by making a large number of requests. You can use the following code in any of these situations:

  • When robots do not adhere to the robots.txt file
  • When you want to immediately block the traffic without waiting for robot crawlers to fetch the robots.txt file
RewriteCond %\{HTTP_REFERER\} ^$
RewriteCond %\{HTTP_USER_AGENT\} "<exact_name_for_the_bot>"
RewriteRule ^(.*)$ - [F,L]