Application Launcher

As a website owner or developer, to ensure the security and performance of your website you must control access and manage unwanted traffic.

Acquia recommends that you implement controls through Acquia Edge WAF/CDN or Varnish (VCL) when possible, as these methods stop traffic before it reaches your application servers to conserve resources. If these methods are unavailable, you can control access at the origin through the .htaccess file.

The following are the common methods to manage unwanted traffic on your website:

Identify unwanted traffic sources
Block specific IP addresses
Decide between blocking and rate-limiting
Block specific user agents
Block traffic from robot crawlers
Block hotlinks

Each of these methods blocks different entity types from accessing your sites. In the recommended code snippets in these methods, ensure that the regular expressions and Apache directives are properly formatted and adapted for your specific use case.

Identify unwanted traffic sources

Before you can block traffic, you must identify the source. If your site experiences slowdowns or a spike in requests (visible in Stack Metrics), connect to the environment through SSH and use the following commands on the access.log to identify the offenders.

Top 20 IP addresses

To identify specific hosts that generates the most requests:

awk '{print $1}' /var/log/nginx/access.log* | sort | uniq -c | sort -rn | head -20

Top 20 user agents (bots/browsers)

To identify the bots or scrapers involved:

awk -F'"' '{print $6}' /var/log/nginx/access.log* | sort | uniq -c | sort -rn | head -20

Top 20 requested paths (targeted URLs)

To identify the targeted endpoints:

awk -F'"' '{print $2}' /var/log/nginx/access.log* | awk '{print $2}' | sort | uniq -c | sort -rn | head -20

Once identified, use these IPs or User Agents in the following implementation methods.

Block specific IP addresses

Use this method when you have identified one or a few specific IP addresses that abuses your site. For example, brute-force attacks.

If you use Acquia Edge, implement IP blocks through the WAF. This is the most performant method as it does not consume origin resources.

Create a WAF rule that matches the abusive IPs. For example, Expression in Cloudflare: (ip.src in {198.51.100.23 203.0.113.0/24})
Set Action to Block.
Deploy the rule.

Decide between blocking and rate-limiting

Select the appropriate action to ensure that you stop abuse with no impact to legitimate users.

Block: Use when malicious traffic comes from a specific source. For example, block specific IPs.
Rate-Limit: Use when the unwanted traffic is distributed across many IPs, such as scrapers that hit /search or API endpoints, or when a legitimate service makes multiple requests. Rate-limiting caps the number of requests per IP in a specific time frame. When the threshold is exceeded, the visitor might be challenged through CAPTCHA or temporarily blocked.

This typically requires Acquia Edge powered by Cloudflare.

Block specific user agents

Caution

Do not block reputable crawlers, such as Googlebot or Bingbot as this negatively impacts your Search Engine Optimization (SEO). Verify the source of the traffic before you block a common User Agent.

To block specific user agents, add the following code snippet in the .htaccess file:

RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]

To avoid website errors, you must properly use escape characters in your regular expressions (regex). HTTP_USER_AGENT can use regex as an argument. In the preceding code, many user agents require regex because of the complexity of their name. Instead of creating a rule manually, you can use websites such as https://beautifycode.net/regex-escape to quickly construct a properly-escaped regex.

Test the blocking of user agents

Run the following command to test that the site responds:
curl -H "host:www.url_you_are_testing.url http://localhost/
Run the following command to test that the user agent is blocked:
curl -H "host:www.url_you_are_testing.url" -H "user-agent:Pcore-HTTP/v0.25.0" http://localhost/
This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.

Block traffic from robot crawlers

As a website owner or developer, to ensure the security and performance of your website you must control access and manage unwanted traffic.

The following are the common methods to manage unwanted traffic on your website:

Identify unwanted traffic sources
Block specific IP addresses
Decide between blocking and rate-limiting
Block specific user agents
Block traffic from robot crawlers
Block hotlinks

Identify unwanted traffic sources

Top 20 IP addresses

To identify specific hosts that generates the most requests:

awk '{print $1}' /var/log/nginx/access.log* | sort | uniq -c | sort -rn | head -20

Top 20 user agents (bots/browsers)

To identify the bots or scrapers involved:

awk -F'"' '{print $6}' /var/log/nginx/access.log* | sort | uniq -c | sort -rn | head -20

Top 20 requested paths (targeted URLs)

To identify the targeted endpoints:

awk -F'"' '{print $2}' /var/log/nginx/access.log* | awk '{print $2}' | sort | uniq -c | sort -rn | head -20

Once identified, use these IPs or User Agents in the following implementation methods.

Block specific IP addresses

Use this method when you have identified one or a few specific IP addresses that abuses your site. For example, brute-force attacks.

If you use Acquia Edge, implement IP blocks through the WAF. This is the most performant method as it does not consume origin resources.

Create a WAF rule that matches the abusive IPs. For example, Expression in Cloudflare: (ip.src in {198.51.100.23 203.0.113.0/24})
Set Action to Block.
Deploy the rule.

Decide between blocking and rate-limiting

Select the appropriate action to ensure that you stop abuse with no impact to legitimate users.

Block: Use when malicious traffic comes from a specific source. For example, block specific IPs.
Rate-Limit: Use when the unwanted traffic is distributed across many IPs, such as scrapers that hit /search or API endpoints, or when a legitimate service makes multiple requests. Rate-limiting caps the number of requests per IP in a specific time frame. When the threshold is exceeded, the visitor might be challenged through CAPTCHA or temporarily blocked.

This typically requires Acquia Edge powered by Cloudflare.

Block specific user agents

Caution

Do not block reputable crawlers, such as Googlebot or Bingbot as this negatively impacts your Search Engine Optimization (SEO). Verify the source of the traffic before you block a common User Agent.

To block specific user agents, add the following code snippet in the .htaccess file:

RewriteCond %{HTTP_USER_AGENT} UserAgent
RewriteRule .* - [F,L]
You can also block more than one User Agent at a time with the [OR] ('or next condition') flag, and the [NC] ('no case') flag renders the string case insensitive. Here are some examples of some user-agents with properly escaped regexes:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ Yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AppleNewsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ \(compatible;\ YandexBot [NC]
RewriteRule .* - [F,L]

Test the blocking of user agents

Run the following command to test that the site responds:
curl -H "host:www.url_you_are_testing.url http://localhost/
Run the following command to test that the user agent is blocked:
curl -H "host:www.url_you_are_testing.url" -H "user-agent:Pcore-HTTP/v0.25.0" http://localhost/
This is an example command to test a user agent named Pcore. You can run a similar command for your intended user agent.

Block traffic from robot crawlers

Blocking unwanted traffic on your website | Acquia Product Documentation

Identify unwanted traffic sources

Block specific IP addresses

Decide between blocking and rate-limiting

Block specific user agents

Test the blocking of user agents

Block traffic from robot crawlers

Identify unwanted traffic sources

Block specific IP addresses

Decide between blocking and rate-limiting

Block specific user agents

Test the blocking of user agents

Block traffic from robot crawlers

Identify unwanted traffic sources

Block specific IP addresses

Decide between blocking and rate-limiting

Block specific user agents

Test the blocking of user agents

Block traffic from robot crawlers

Block hotlinks

Troubleshoot staging and non-production outages

Before you create a Support ticket

Did not find what you were looking for?

Block hotlinks

Troubleshoot staging and non-production outages

Before you create a Support ticket

Did not find what you were looking for?