Blocking unwanted traffic on your website

As a website owner or developer, to ensure the security and performance of your website you must control access and manage unwanted traffic.

Acquia recommends that you implement controls through Acquia Edge WAF/CDN or Varnish (VCL) when possible, as these methods stop traffic before it reaches your application servers to conserve resources. If these methods are unavailable, you can control access at the origin through the .htaccess file.

The following are the common methods to manage unwanted traffic on your website:

Identify unwanted traffic sources
Block specific IP addresses
Decide between blocking and rate-limiting
Block specific user agents
Block traffic from robot crawlers
Block hotlinks

Each of these methods blocks different entity types from accessing your sites. In the recommended code snippets in these methods, ensure that the regular expressions and Apache directives are properly formatted and adapted for your specific use case.

Identify unwanted traffic sources¶

Before you can block traffic, you must identify the source. If your site experiences slowdowns or a spike in requests (visible in Stack Metrics), connect to the environment through SSH and use the following commands on the access.log to identify the offenders.

Top 20 IP addresses

To identify specific hosts that generates the most requests:

awk '{print $1}' /var/log/nginx/access.log* | sort | uniq -c | sort -rn | head -20

Top 20 user agents (bots/browsers)

To identify the bots or scrapers involved:

awk -F'"' '{print $6}' /var/log/nginx/access.log* | sort | uniq -c | sort -rn | head -20

Top 20 requested paths (targeted URLs)

To identify the targeted endpoints:

awk -F'"' '{print $2}' /var/log/nginx/access.log* | awk '{print $2}' | sort | uniq -c | sort -rn | head -20

Once identified, use these IPs or User Agents in the following implementation methods.

Block specific IP addresses¶

Use this method when you have identified one or a few specific IP addresses that abuses your site. For example, brute-force attacks.

Decide between blocking and rate-limiting¶

Select the appropriate action to ensure that you stop abuse with no impact to legitimate users.

Block: Use when malicious traffic comes from a specific source. For example, block specific IPs.
Rate-Limit: Use when the unwanted traffic is distributed across many IPs, such as scrapers that hit /search or API endpoints, or when a legitimate service makes multiple requests. Rate-limiting caps the number of requests per IP in a specific time frame. When the threshold is exceeded, the visitor might be challenged through CAPTCHA or temporarily blocked.

This typically requires Acquia Edge powered by Cloudflare.

Block specific user agents¶

To block specific user agents, add the following code snippet in the .htaccess file:

RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]

To avoid website errors, you must properly use escape characters in your regular expressions (regex). HTTP_USER_AGENT can use regex as an argument. In the preceding code, many user agents require regex because of the complexity of their name. Instead of creating a rule manually, you can use websites such as https://beautifycode.net/regex-escape to quickly construct a properly-escaped regex.

Test the blocking of user agents¶

Run the following command to test that the site responds:
curl -H "host:www.url_you_are_testing.url http://localhost/
Run the following command to test that the user agent is blocked:
curl -H "host:www.url_you_are_testing.url" -H "user-agent:Pcore-HTTP/v0.25.0" http://localhost/
This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.

Block traffic from robot crawlers¶

A robot crawler can cause problems to a site. It makes a large number of requests. You can use the following code in any of these situations:

When robots do not adhere to the robots.txt file
When you want to immediately block the traffic and you do not wait for robot crawlers to fetch the robots.txt file

RewriteCond %\{HTTP_REFERER\} ^$
RewriteCond %\{HTTP_USER_AGENT\} "<exact_name_for_the_bot>"
RewriteRule ^(.*)$ - [F,L]

Block hotlinks¶

Website owners want to protect their website from other websites that intend to steal content, or hotlink to images and steal bandwidth.

To prevent hotlinking, use the following code and replace domain.com with your domain name.

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/ .*$ NC
RewriteRule \.(gif|jpg|swf|flv|png)$ /feed/ R=302,L

In this code, R=302 indicates a temporary redirect. If you want a permanent redirect, adjust it to R=301.

Troubleshoot staging and non-production outages¶

Staging environment slowdowns often result from configuration issues or internal testing rather than external attacks.

If your non-production site is unresponsive:

Check recent activity: Verify if there were recent code deployments, configuration changes, or feature flag toggles.
Verify access controls: Confirm IP ACLs or Basic Auth settings have not accidentally blocked legitimate users or testing tools.
Review logs for internal tools: Check the access logs for repetitive requests from internal testing or monitoring tools. Use the commands mentioned in the Identify unwanted traffic sources section.
Check Application Errors: Review the error.log and mysql-slow.log for application failures rather than traffic volume.
Check Cron Status: Ensure that scheduled jobs are not stuck or do not cause resource exhaustion.

Before you create a Support ticket¶

If the issue persists or you suspect a large-scale DDoS attack, collect the following data before you create a Support ticket. Providing this information upfront expedites resolution.

Environment: Produstion, Staging, Development and Environment ID.
Time Window (UTC): Exact start and end time of the incident. For example, 2025-11-17 18:00–18:30 UTC.
Triage Data: Output from the Triage CLI commands. For example, Top 5 IPs, User Agents, and Paths.
Actions Taken: Specific WAF rules, rate limits, or .htaccess changes that you applied.
Sample Request: A recent X-Request-ID header value from a failed or blocked request, if available.

To block specific user agents, add the following code snippet in the .htaccess file:

RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]

Test the blocking of user agents¶

Run the following command to test that the site responds:
curl -H "host:www.url_you_are_testing.url http://localhost/
Run the following command to test that the user agent is blocked:
curl -H "host:www.url_you_are_testing.url" -H "user-agent:Pcore-HTTP/v0.25.0" http://localhost/
This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.

Block traffic from robot crawlers¶

A robot crawler can cause problems to a site. It makes a large number of requests. You can use the following code in any of these situations:

When robots do not adhere to the robots.txt file
When you want to immediately block the traffic and you do not wait for robot crawlers to fetch the robots.txt file

RewriteCond %\{HTTP_REFERER\} ^$
RewriteCond %\{HTTP_USER_AGENT\} "<exact_name_for_the_bot>"
RewriteRule ^(.*)$ - [F,L]

Block hotlinks¶

Website owners want to protect their website from other websites that intend to steal content, or hotlink to images and steal bandwidth.

To prevent hotlinking, use the following code and replace domain.com with your domain name.

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/ .*$ NC
RewriteRule \.(gif|jpg|swf|flv|png)$ /feed/ R=302,L

In this code, R=302 indicates a temporary redirect. If you want a permanent redirect, adjust it to R=301.

Troubleshoot staging and non-production outages¶

Staging environment slowdowns often result from configuration issues or internal testing rather than external attacks.

If your non-production site is unresponsive:

Check recent activity: Verify if there were recent code deployments, configuration changes, or feature flag toggles.
Verify access controls: Confirm IP ACLs or Basic Auth settings have not accidentally blocked legitimate users or testing tools.
Review logs for internal tools: Check the access logs for repetitive requests from internal testing or monitoring tools. Use the commands mentioned in the Identify unwanted traffic sources section.
Check Application Errors: Review the error.log and mysql-slow.log for application failures rather than traffic volume.
Check Cron Status: Ensure that scheduled jobs are not stuck or do not cause resource exhaustion.

Before you create a Support ticket¶

If the issue persists or you suspect a large-scale DDoS attack, collect the following data before you create a Support ticket. Providing this information upfront expedites resolution.

Environment: Produstion, Staging, Development and Environment ID.
Time Window (UTC): Exact start and end time of the incident. For example, 2025-11-17 18:00–18:30 UTC.
Triage Data: Output from the Triage CLI commands. For example, Top 5 IPs, User Agents, and Paths.
Actions Taken: Specific WAF rules, rate limits, or .htaccess changes that you applied.
Sample Request: A recent X-Request-ID header value from a failed or blocked request, if available.

Identify unwanted traffic sources¶

Block specific IP addresses¶

Acquia Edge WAF (Recommended)

.htaccess (Fallback)

Decide between blocking and rate-limiting¶

Block specific user agents¶

Test the blocking of user agents¶

Block traffic from robot crawlers¶

Block hotlinks¶

Troubleshoot staging and non-production outages¶

Before you create a Support ticket¶

Did not find what you were looking for?

Identify unwanted traffic sources¶

Block specific IP addresses¶

Acquia Edge WAF (Recommended)

.htaccess (Fallback)

Decide between blocking and rate-limiting¶

Block specific user agents¶

Test the blocking of user agents¶

Block traffic from robot crawlers¶

Block hotlinks¶

Troubleshoot staging and non-production outages¶

Before you create a Support ticket¶

Did not find what you were looking for?

Loading...

Loading...

Identify unwanted traffic sources¶

Block specific IP addresses¶

Acquia Edge WAF (Recommended)

.htaccess (Fallback)

Decide between blocking and rate-limiting¶

Block specific user agents¶

Test the blocking of user agents¶

Block traffic from robot crawlers¶

Block hotlinks¶

Troubleshoot staging and non-production outages¶

Before you create a Support ticket¶

Did not find what you were looking for?

Blocking unwanted traffic on your website

Identify unwanted traffic sources¶

Block specific IP addresses¶

Acquia Edge WAF (Recommended)

.htaccess (Fallback)

Decide between blocking and rate-limiting¶

Block specific user agents¶

Test the blocking of user agents¶

Block traffic from robot crawlers¶

Block hotlinks¶

Troubleshoot staging and non-production outages¶

Before you create a Support ticket¶

Did not find what you were looking for?