As a website owner or developer, to ensure the security and performance of your website you must control access and manage unwanted traffic.
Acquia recommends that you implement controls through Acquia Edge WAF/CDN or Varnish (VCL) when possible, as these methods stop traffic before it reaches your application servers to conserve resources. If these methods are unavailable, you can control access at the origin through the .htaccess file.
The following are the common methods to manage unwanted traffic on your website:
Each of these methods blocks different entity types from accessing your sites. In the recommended code snippets in these methods, ensure that the regular expressions and Apache directives are properly formatted and adapted for your specific use case.
Identify unwanted traffic sources
Before you can block traffic, you must identify the source. If your site experiences slowdowns or a spike in requests (visible in Stack Metrics), connect to the environment through SSH and use the following commands on the access.log to identify the offenders.
Top 20 IP addresses
To identify specific hosts that generates the most requests:
Once identified, use these IPs or User Agents in the following implementation methods.
Block specific IP addresses
Use this method when you have identified one or a few specific IP addresses that abuses your site. For example, brute-force attacks.
If you use Acquia Edge, implement IP blocks through the WAF. This is the most performant method as it does not consume origin resources.
Create a WAF rule that matches the abusive IPs. For example, Expression in Cloudflare: (ip.src in {198.51.100.23 203.0.113.0/24})
Set Action to Block.
Deploy the rule.
Decide between blocking and rate-limiting
Select the appropriate action to ensure that you stop abuse with no impact to legitimate users.
Block: Use when malicious traffic comes from a specific source. For example, block specific IPs.
Rate-Limit: Use when the unwanted traffic is distributed across many IPs, such as scrapers that hit /search or API endpoints, or when a legitimate service makes multiple requests. Rate-limiting caps the number of requests per IP in a specific time frame. When the threshold is exceeded, the visitor might be challenged through CAPTCHA or temporarily blocked.
Do not block reputable crawlers, such as Googlebot or Bingbot as this negatively impacts your Search Engine Optimization (SEO). Verify the source of the traffic before you block a common User Agent.
To block specific user agents, add the following code snippet in the .htaccess file:
RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]
To avoid website errors, you must properly use escape characters in your regular expressions (regex). HTTP_USER_AGENT can use regex as an argument. In the preceding code, many user agents require regex because of the complexity of their name. Instead of creating a rule manually, you can use websites such as https://beautifycode.net/regex-escape to quickly construct a properly-escaped regex.
Test the blocking of user agents
Run the following command to test that the site responds:
This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.
Block traffic from robot crawlers
Blocking unwanted traffic on your website
As a website owner or developer, to ensure the security and performance of your website you must control access and manage unwanted traffic.
Acquia recommends that you implement controls through Acquia Edge WAF/CDN or Varnish (VCL) when possible, as these methods stop traffic before it reaches your application servers to conserve resources. If these methods are unavailable, you can control access at the origin through the .htaccess file.
The following are the common methods to manage unwanted traffic on your website:
Each of these methods blocks different entity types from accessing your sites. In the recommended code snippets in these methods, ensure that the regular expressions and Apache directives are properly formatted and adapted for your specific use case.
Identify unwanted traffic sources
Before you can block traffic, you must identify the source. If your site experiences slowdowns or a spike in requests (visible in Stack Metrics), connect to the environment through SSH and use the following commands on the access.log to identify the offenders.
Top 20 IP addresses
To identify specific hosts that generates the most requests:
Once identified, use these IPs or User Agents in the following implementation methods.
Block specific IP addresses
Use this method when you have identified one or a few specific IP addresses that abuses your site. For example, brute-force attacks.
If you use Acquia Edge, implement IP blocks through the WAF. This is the most performant method as it does not consume origin resources.
Create a WAF rule that matches the abusive IPs. For example, Expression in Cloudflare: (ip.src in {198.51.100.23 203.0.113.0/24})
Set Action to Block.
Deploy the rule.
Decide between blocking and rate-limiting
Select the appropriate action to ensure that you stop abuse with no impact to legitimate users.
Block: Use when malicious traffic comes from a specific source. For example, block specific IPs.
Rate-Limit: Use when the unwanted traffic is distributed across many IPs, such as scrapers that hit /search or API endpoints, or when a legitimate service makes multiple requests. Rate-limiting caps the number of requests per IP in a specific time frame. When the threshold is exceeded, the visitor might be challenged through CAPTCHA or temporarily blocked.
Do not block reputable crawlers, such as Googlebot or Bingbot as this negatively impacts your Search Engine Optimization (SEO). Verify the source of the traffic before you block a common User Agent.
To block specific user agents, add the following code snippet in the .htaccess file:
RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]
To avoid website errors, you must properly use escape characters in your regular expressions (regex). HTTP_USER_AGENT can use regex as an argument. In the preceding code, many user agents require regex because of the complexity of their name. Instead of creating a rule manually, you can use websites such as https://beautifycode.net/regex-escape to quickly construct a properly-escaped regex.
Test the blocking of user agents
Run the following command to test that the site responds:
In this code, R=302 indicates a temporary redirect. If you want a permanent redirect, adjust it to R=301.
Troubleshoot staging and non-production outages
Staging environment slowdowns often result from configuration issues or internal testing rather than external attacks.
If your non-production site is unresponsive:
Check recent activity: Verify if there were recent code deployments, configuration changes, or feature flag toggles.
Verify access controls: Confirm IP ACLs or Basic Auth settings have not accidentally blocked legitimate users or testing tools.
Review logs for internal tools: Check the access logs for repetitive requests from internal testing or monitoring tools. Use the commands mentioned in the Identify unwanted traffic sources section.
Check Application Errors: Review the error.log and mysql-slow.log for application failures rather than traffic volume.
Check Cron Status: Ensure that scheduled jobs are not stuck or do not cause resource exhaustion.
Before you create a Support ticket
If the issue persists or you suspect a large-scale DDoS attack, collect the following data before you create a Support ticket. Providing this information upfront expedites resolution.
Environment: Produstion, Staging, Development and Environment ID.
Time Window (UTC): Exact start and end time of the incident. For example, 2025-11-17 18:00–18:30 UTC.
Triage Data: Output from the Triage CLI commands. For example, Top 5 IPs, User Agents, and Paths.
Actions Taken: Specific WAF rules, rate limits, or .htaccess changes that you applied.
Sample Request: A recent X-Request-ID header value from a failed or blocked request, if available.
Did not find what you were looking for?
If this content did not answer your questions, try searching or contacting our support team for further assistance.
A robot crawler can cause problems to a site. It makes a large number of requests. You can use the following code in any of these situations:
When robots do not adhere to the robots.txt file
When you want to immediately block the traffic and you do not wait for robot crawlers to fetch the robots.txt file
In this code, R=302 indicates a temporary redirect. If you want a permanent redirect, adjust it to R=301.
Troubleshoot staging and non-production outages
Staging environment slowdowns often result from configuration issues or internal testing rather than external attacks.
If your non-production site is unresponsive:
Check recent activity: Verify if there were recent code deployments, configuration changes, or feature flag toggles.
Verify access controls: Confirm IP ACLs or Basic Auth settings have not accidentally blocked legitimate users or testing tools.
Review logs for internal tools: Check the access logs for repetitive requests from internal testing or monitoring tools. Use the commands mentioned in the Identify unwanted traffic sources section.
Check Application Errors: Review the error.log and mysql-slow.log for application failures rather than traffic volume.
Check Cron Status: Ensure that scheduled jobs are not stuck or do not cause resource exhaustion.
Before you create a Support ticket
If the issue persists or you suspect a large-scale DDoS attack, collect the following data before you create a Support ticket. Providing this information upfront expedites resolution.
Environment: Produstion, Staging, Development and Environment ID.
Time Window (UTC): Exact start and end time of the incident. For example, 2025-11-17 18:00–18:30 UTC.
Triage Data: Output from the Triage CLI commands. For example, Top 5 IPs, User Agents, and Paths.
Actions Taken: Specific WAF rules, rate limits, or .htaccess changes that you applied.
Sample Request: A recent X-Request-ID header value from a failed or blocked request, if available.
Blocking unwanted traffic on your website | Acquia Product Documentation
Cloud Platform
Blocking unwanted traffic on your website
Did not find what you were looking for?
If this content did not answer your questions, try searching or contacting our support team for further assistance.