As a website owner or developer, you must ensure security and performance of your website. One effective way to achieve this is to control access to your website by using the .htaccess
file.
The following are the common methods to block unwanted traffic on your website:
Each of these methods focuses on blocking different entity types from accessing your sites. In the recommended code snippets in these methods, ensure that the regular expressions and Apache directives are properly formatted and adapted for your specific use case.
Blocking specific user agents
If your website encounters a DDoS attack and you want to block a group of IP addresses using the same user agent, then use the following code after replacing UserAgent
with the name of the agent that you want to block:
RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]
To avoid website errors, you must properly use escape characters in your regular expressions (regex). HTTP_USER_AGENT can use regex as an argument. In the preceding code, many user agents require regex because of the complexity of their name. Instead of creating a rule manually, you can use websites such as https://beautifycode.net/regex-escape to quickly construct a properly-escaped regex.
Testing the blocking of user agents
Run the following command to test that the site is responding:
curl -H "host:www.url_you_are_testing.url http://localhost/
Run the following command to test that the user agent is blocked:
curl -H "host:www.url_you_are_testing.url" -H "user-agent:Pcore-HTTP/v0.25.0" http://localhost/
This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.
Blocking traffic from robot crawlers
A robot crawler can cause problems to a site by making a large number of requests. You can use the following code in any of these situations:
- When robots do not adhere to the robots.txt file
- When you want to immediately block the traffic without waiting for robot crawlers to fetch the robots.txt file
RewriteCond %\{HTTP_REFERER\} ^$
RewriteCond %\{HTTP_USER_AGENT\} "<exact_name_for_the_bot>"
RewriteRule ^(.*)$ - [F,L]
Blocking hotlinks
Website owners want to protect their website from other websites that intend to steal content, or hotlink to images and steal bandwidth.
To prevent hotlinking, use the following code after replacing domain.com
with your domain name.
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/ .*$ NC
RewriteRule \.(gif|jpg|swf|flv|png)$ /feed/ R=302,L
In this code, R=302
indicates a temporary redirect. If you want a permanent redirect, adjust it to R=301
.