---
title: "Blocking a User Agent with auto-generated code added to .htaccess"
date: "2025-02-05T23:27:39+00:00"
summary:
image:
type: "article"
url: "/acquia-cloud-platform/help/94516-blocking-user-agent-auto-generated-code-added-htaccess"
id: "7c3a3a67-3d21-4ad2-8b9f-694fdeb19ad5"
---

Steps
-----

(1) \`ssh\` to one of the Drupal site's webservers

(2) \`cd\` to the Logs directory

(3) Run the snippet shown in this transcript:

    myemployeesite@ded-1234:/var/log/sites/myemployeesite.prod/logs/ded-1234$ min_pct=20; snippet=/mnt/tmp/htaccess-block-snippet.txt; cat /dev/null >$snippet; cat access.log|tail -200000  |awk -F\" '{print $6}' | cat >/tmp/tmp-count-$$ && total=`grep -c . /tmp/tmp-count-$$` && sort /tmp/tmp-count-$$ |uniq -c |sort -nr | awk -v total=$total 'NR==1 { snippet="'$snippet'"; num_block=0; print "  Count (pct)  Value" } { num=$1; $1=""; pct=num/total*100; if (pct>1) { printf("%7d (%2d%%) %s\n", num, pct, $0); } if (pct>'$min_pct') { ++num_block; sub(/^[ \t\r\n]+/, "", $0); gsub(/[^a-zA-Z0-9\./:_ -]/, ".", $0); gsub(/\./, "\\.", $0); block[num_block]=$0; } } END { print "# " num_block " User-Agents should be blocked"; if (num_block>0) { print "# Add this into .htaccess rules" >snippet; print "#" >snippet; for (i=1; i<=num_block; i++) { print "RewriteCond %{HTTP_USER_AGENT} \"" block[i] "\" " (i<num_block ? "[NC,OR]" : "[NC]") >snippet} print "RewriteRule .* - [F,L]" >snippet} }' && rm /tmp/tmp-count-$$  && cat $snippet
      Count (pct)  Value
       3577 (29%)  Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)
        706 ( 5%)  check_http/v2.2 (monitoring-plugins 2.2)
    ...
    # 1 User-Agents should be blocked
    # Add this into .htaccess rules
    #
    RewriteCond %{HTTP_USER_AGENT} "Mozilla/5\.0 \.compatible\. SemrushBot/7\.bl\. \.http://www\.semrush\.com/bot\.html\." [NC]
    RewriteRule .* - [F,L]

Just the snippet is

    min_pct=20; snippet=/mnt/tmp/htaccess-block-snippet.txt; cat /dev/null >$snippet; cat access.log|tail -200000  |awk -F\" '{print $6}' | cat >/tmp/tmp-count-$$ && total=`grep -c . /tmp/tmp-count-$$` && sort /tmp/tmp-count-$$ |uniq -c |sort -nr | awk -v total=$total 'NR==1 { snippet="'$snippet'"; num_block=0; print "  Count (pct)  Value" } { num=$1; $1=""; pct=num/total*100; if (pct>1) { printf("%7d (%2d%%) %s\n", num, pct, $0); } if (pct>'$min_pct') { ++num_block; sub(/^[ \t\r\n]+/, "", $0); gsub(/[^a-zA-Z0-9\./:_ -]/, ".", $0); gsub(/\./, "\\.", $0); block[num_block]=$0; } } END { print "# " num_block " User-Agents should be blocked"; if (num_block>0) { print "# Add this into .htaccess rules" >snippet; print "#" >snippet; for (i=1; i<=num_block; i++) { print "RewriteCond %{HTTP_USER_AGENT} \"" block[i] "\" " (i<num_block ? "[NC,OR]" : "[NC]") >snippet} print "RewriteRule .* - [F,L]" >snippet} }' && rm /tmp/tmp-count-$$  && cat $snippet

(4) Add the code recommended to the \`.htaccess\` file, test, and push the code change live. In this example, the code to add is

    RewriteCond %{HTTP_USER_AGENT} "Mozilla/5\.0 \.compatible\. SemrushBot/7\.bl\. \.http://www\.semrush\.com/bot\.html\." [NC]
    RewriteRule .* - [F,L]

Warning

RESULTS MAY VARY; YOU SHOULD TRY THE .htaccess BLOCKING CODE on a dev, local, and/or any other non-production site.

Testing
-------

    curl -sSLIXGET http://myemployeesitedev.prod.acquia-sites.com/url-that-does-or-doesnt-exist   -H "User-Agent: Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"

For the example above, if you blocked this particular robot, you should obtain a 403 (regardless of the URL!)