HTTP Traffic redirection based on User-Agent Header

Camilo Herrera
winkhosting
Published in
14 min readFeb 1, 2023
Photo by Man Chung on Unsplash

Pro Tip: When faced with a high volume of requests to a website, remember Bruce Lee’s wise words “Be water, my friend”

Today my dear reader, we will talk about a 100% real case (because we are keep in it 100 round here), about an attack on one of our websites.

It all started with our client’s report on their site, indicating that it was not possible to access it and a message like this was displayed:

Bandwidth Limit Exceeded

The server is temporarily unable to service your request due to the site owner reaching his/her bandwidth limit. Please try again later.

This message is displayed by our servers when a website reaches the limit of monthly transfer assigned. (The transfer, is the traffic that takes place between your server and any actor that communicates with it, generally it only refers to the outgoing traffic, that is, from your server to the actors that visit your site or download files, for example).

From the report, we started checking website traffic and found something like this:

Malicious bot traffic sample

66.249.65.120 - - [23/Jan/2023:07:10:02 -0500] "GET /?0694lobc28030csi187359.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.120 - - [23/Jan/2023:07:11:22 -0500] "GET /?9453wyca61460akg1320522.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:11:22 -0500] "GET /robots.txt HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
114.119.142.37 - - [23/Jan/2023:07:11:54 -0500] "GET /?8655nqcp42237pkk88280.html HTTP/1.1" 500 7309 "https://mysamplesite.com/?8655nqcp40361pkk211402.html" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
66.249.65.120 - - [23/Jan/2023:07:12:41 -0500] "GET /?3598dhdz10101zsq1935112.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:13:33 -0500] "GET /?5116oavv41120vcm970162.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.122 - - [23/Jan/2023:07:13:33 -0500] "GET /?7537fatm18056mgo189475.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:14:01 -0500] "GET /?9928nbbl-23612leq-1793-634.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.122 - - [23/Jan/2023:07:15:20 -0500] "GET /robots.txt HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.120 - - [23/Jan/2023:07:15:20 -0500] "GET /?8425etst44656tek508701.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:16:40 -0500] "GET /?2288eqbi3960iqq1791964.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
114.119.138.65 - - [23/Jan/2023:07:17:41 -0500] "GET /?8655nqcp42236pkk87279.html HTTP/1.1" 500 7309 "https://mysamplesite.com/?8655nqcp39690pkk1539730.html" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
66.249.65.120 - - [23/Jan/2023:07:17:59 -0500] "GET /?2288eqbi3799iqq1630803.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.120 - - [23/Jan/2023:07:19:19 -0500] "GET /robots.txt HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:19:19 -0500] "GET /?8270bwnl15589loa1426605.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.120 - - [23/Jan/2023:07:19:33 -0500] "GET /?8270bwnl13152loa988166.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:19:34 -0500] "GET /?7537fatm5165mgo997171.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.122 - - [23/Jan/2023:07:20:38 -0500] "GET /?8270bwnl14918loa755933.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:21:58 -0500] "GET /?0898iiiy-31607ysq-1792-637.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.122 - - [23/Jan/2023:07:22:33 -0500] "GET /?8041kjhl-24025lic-207-48.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:22:33 -0500] "GET /robots.txt HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
114.119.158.196 - - [23/Jan/2023:07:23:00 -0500] "GET /?8655nqcp42202pkk53245.html HTTP/1.1" 500 7309 "https://mysamplesite.com/?8655nqcp39522pkk1371562.html" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
66.249.65.120 - - [23/Jan/2023:07:23:17 -0500] "GET /?9166yewp-6999pmm-1172-5.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:24:37 -0500] "GET /?9453wyca28952akg796981.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
114.119.138.127 - - [23/Jan/2023:07:24:38 -0500] "GET /robots.txt HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
66.249.65.120 - - [23/Jan/2023:07:25:33 -0500] "GET /?2288eqbi27556iqq1399584.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.122 - - [23/Jan/2023:07:25:33 -0500] "GET /?8731mhos8493sgc327502.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.120 - - [23/Jan/2023:07:25:56 -0500] "GET /robots.txt HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:25:56 -0500] "GET /?9453wyca28420akg264449.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.118 - - [23/Jan/2023:07:27:16 -0500] "GET /?8917ijnl21806lco1646828.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.120 - - [23/Jan/2023:07:28:35 -0500] "GET /?8316ktkj7656jcm1489664.html HTTP/1.1" 500 7309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
114.119.159.197 - - [23/Jan/2023:07:28:40 -0500] "GET /?8655nqcp42195pkk46238.html HTTP/1.1" 500 7309 "https://mysamplesite.com/?8655nqcp37974pkk182213.html" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

Funny right?, funny and worrying. Now imagine this sample multiplied by 100,000. Now we do have a problem.

At this point we have several options to solve this event:

Using a firewall

With a firewall, for example, we can generate a list of source IPs and add certain ranges to your block list, (are you using a firewall on your server, right?). This is inefficient and does not attack the root of the problem, as they are using Google bots as the source of requests.

If you decide to take this option you would end up isolating your server from a significant range of IPs, affecting traffic to all your hosted sites.

Pro Tip: Remember that your will to fight runs out long before IP ranges.

In conclusion, this would be a very bad idea.

Disable the script that is receiving the requests

It is important that you review the content of the script that is getting so much attention from our virtual friends. It is very likely that it contains an unpleasant surprise.

If you checked it and do not notice strange or malicious content, disabling it or preventing access to it does not solve the problem either, the site owner or administrator will not like to see their page suspended or not responding to visits.

Again, this would not be a viable option.

Use the robots.txt file

This is the passive aggressive “please don’t hit me” style on the Internet. I do not recommend it, bots can ignore your pleas and not respect this configuration.

Use redirection rules on the web server

This option is the one I like the most. It is personal, close and even fun, and will be the one we will explore here.

Using redirection rules (or route overwriting) starts with the requests analysis, pattern detection and the corresponding rule implementation to block or redirect (this is the alternative that I like the most) the traffic. In this way the amount of information that is delivered during the response to the request is reduced, the transfer consumption is therefore reduced and we can send the attacker to the place that our imagination suggests, the sky is the limit!.

The steps to implement these rules will be as follows:

  • Sample requests analysis.
  • Requested URL/file review.
  • Redirection rule design based on the type of web server used.
  • Rule implementation.
  • Server track traffic monitoring.
  • Smile, walk, stay hydrated and sleep well.

Let’s dive in!

Photo by Oliver Sjöström on Unsplash

Sample requests analysis

The sample log that you saw in a previous section, belongs to Apache httpd and corresponds to the Access Log for a website in Combined Log Format or CLF (Combined Log Format), this contains the following fields:

  • IP origin: Request source IP
66.249.65.120
  • Date and Time: This field shows… It is correct, the date and time of the request.
[23/Jan/2023:07:10:02 -0500]
  • Request Type and resource requested: This field contains the type of request (GET, POST, PUT, PATCH, DELETE), the address of the resource or file that is requested and can also include the version of HTTP used.
"GET /?0694lobc28030csi187359.html HTTP/1.1"
  • HTTP response code: The response code of the server is usually 200 if successful, 500 if there is an error, etc. If you don’t remember them here is a video created by Winkhosting.co (spanish) to remember them (and if you grew up watching The Simpsons you will surely like it).
  • Server reponse byte size: Very important, the larger the size, the greater the transfer consumption in your service.
  • The “User-Agent”: This field will also be relevant and reports to the server the type of device, operating system, application, etc. that performs the request. Remember that this field can be manipulated by an attacker (spoofing), but for the purposes of this event we confirm that the source IPs of our requests really belong to the bots indicated by this field in the log.
"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

From the information in these fields we will analyze the type of rule and what part of the request we can use to filter, block or redirect traffic.

In this particular case we will use the header “User-Agent” simply because it is easier and I am lazy, but even more because it will be enough to resolve the behavior temporarily until the traffic is normalized. Keep in mind that if you redirect or block bot traffic, it can affect search results and even visits from users who use a search engine to be redirected to a page.

Requested URL/file review

For this particular case, we checked the index.php which was receiving the requests, but I’m not going to tell you what we found because it may be a topic for another publication.

Photo by Belinda Fewings on Unsplash

Let’s assume we did not find things that still haunt me in my dreams and move on.

Pro tip: It is very, very important that you always check the files or URLs that receive suspicious requests, always!.

Redirection rule design

Very well, we come to the most complex part, we will design the traffic redirection rules for three popular servers, Apache HTTP Server “the reliable old man”, nginx “the middle brother” and the young man who is gaining strength and gets all the girls… Caddy. (You wonder why I don’t talk about Tomcat?, well Java is not my forte, nor do I wish him a particular evil, but we will not include it).

The rules will be defined in a general way from these steps:

  1. Detect the “User-Agent” header of the request
  2. Search the header for keywords to determine if actions will be taken.
  3. Take action!, based on the “User-Agent”

Apache HTTP Server

Apache is relatively easy to configure for these cases, it is only necessary to have the mod_rewrite module activated and access to the .htaccess file for the attacked website.

For us, the bots to redirect will be those found in the traffic sample:

  • Googlebot
  • PetalBot

And our configuration will be as follows:

.htaccess File

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (googlebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (petalbot) [NC]
RewriteRule .* "http://0.0.0.0/" [R=301,L]
</IfModule>

Add these lines to the beginning of the .htaccess file in the directory where the requested resource or file is located. If there is no .htaccess file you can create it.

Our rule has one line to activate the route overwrite module, and as many “RewriteCond” lines as necessary to detect the name of the bots to be redirected from the “User-Agent”, capturing it from the header “%{HTTP_USER_AGENT}”. (Too many complicated terms in a single paragraph right?, you’ll end up believing I’m a pro)

The “RewriteCond” statement is used to define the condition from which a “RewriteRule” will be applied, that is, if one of the conditions is met, the redirect is executed. “[NC” indicates that the search for the term or text is non case-sensitive, and “,OR]” indicates that at least one of them must be met to redirect the execution.

A final rule “RewriteRule” defines the action to be taken, and the action we will take is:

  • To permanently redirect “[R=301
  • Stop the execution of subsequent rules “,L]
  • From any source “.*
  • To an address that I really like “http://0.0.0.0/", this in terms of network means non-routable, unknown, intangible, ethereal, ephemeral, nebulous, indefinite. I hope you understand the concept.

This change in the file is immediate, so you can check the visit log again and you’ll notice that the requests changed to something like this:

66.249.65.54 - - [26/Jan/2023:07:12:19 -0500] "GET /?3529sryf20359fes199380.html HTTP/1.1" 301 251 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.58 - - [26/Jan/2023:07:15:12 -0500] "GET /?9166yewp17868pmm1706886.html HTTP/1.1" 301 252 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.56 - - [26/Jan/2023:07:15:58 -0500] "GET /?5408qvgw19728waq1567748.html HTTP/1.1" 301 252 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.58 - - [26/Jan/2023:07:19:14 -0500] "GET /?7758aoex11194xkq1029206.html HTTP/1.1" 301 252 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.56 - - [26/Jan/2023:07:21:17 -0500] "GET /?7537fatm6174mgo7181.html HTTP/1.1" 301 248 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.58 - - [26/Jan/2023:07:24:59 -0500] "GET /?4338szji18409igq248428.html HTTP/1.1" 301 251 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.58 - - [26/Jan/2023:07:26:09 -0500] "GET /?8425etst23249tek1090273.html HTTP/1.1" 301 252 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.56 - - [26/Jan/2023:07:28:15 -0500] "GET /?5408qvgw16062waq189979.html HTTP/1.1" 301 251 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.54 - - [26/Jan/2023:07:28:33 -0500] "GET /?7537fatm21027mgo86749.html HTTP/1.1" 301 250 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.58 - - [26/Jan/2023:07:33:05 -0500] "GET /?3529sryf4892fes724897.html HTTP/1.1" 301 250 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.56 - - [26/Jan/2023:07:46:55 -0500] "GET /?7537fatm16622mgo460639.html HTTP/1.1" 301 251 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.54 - - [26/Jan/2023:07:48:56 -0500] "GET /?1079ssfb2579bos410582.html HTTP/1.1" 301 250 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.56 - - [26/Jan/2023:07:53:51 -0500] "GET /?7537fatm10862mgo697873.html HTTP/1.1" 301 251 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.56 - - [26/Jan/2023:07:54:07 -0500] "GET /?5116oavv14746vcm583761.html HTTP/1.1" 301 251 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

You will notice that now the response code is 301 and the size in bytes of the response is much smaller. This will effectively reduce transfer usage and once the attacker notices it, the behavior will stop (it is not profitable for him to continue wasting traffic and resources in that way, at least if he is a smart one.)

Keep these rules active as long as possible, if they change the type of bot, adjust them accordingly and take a nap.

Nginx

Nginx is a different beast, to apply the rules we will go to our nginx.conf file (in the path where you have it saved on your server), and add the following lines to the server block associated with the domain / subdomain of the attacked website:

if ($http_user_agent ~* 'googlebot|petalbot') {
return 301 http://0.0.0.0;
}

Nginx is a bit more elegant in defining the rule. Using “~*” will search for text string matches with bot names within $http_user_agent without case sensitivity, and if there are matches, it will perform a permanent 301 redirect to http://0.0.0.0

Here is a complete basic configuration of Nginx so that you can have the reference of the place where you should add the lines with the redirection, you must add them in the server block of the affected domain, just under server_name:

events {
worker_connections 4096; ## Default: 1024
}

http {
index index.html;
server {
server_name mysamplesite.com;

#User-Agente rule here!
if ($http_user_agent ~* 'googlebot|petalbot') {
return 301 http://0.0.0.0;
}

location / {
root /var/www/mysamplesite.com/htdocs;
}
}

server {
listen 80 default_server;
server_name _; # This is just an invalid value which will never trigger on a real hostname.
server_name_in_redirect off;
location / {
root /var/www/default/htdocs;
}
}
}

And that’s it. Easy right?

Caddy

Caddy Server (how I like this HTTP server). Let’s look at the implementation in this pink-cheeked guy.

To configure our rules we will use the Caddyfile file, usually in Linux it will be at /etc/caddy/Caddyfile or you can check the path in your Caddy .json configuration file. To check the path of the configuration file you can execute the command in console below:

>caddy environ

Let’s edit the Caddyfile. In a default installation you will look something similar to this:

 The Caddyfile is an easy way to configure your Caddy web server.
#
# Unless the file starts with a global options block, the first
# uncommented line is always the address of your site.
#
# To use your own domain name (with automatic HTTPS), first make
# sure your domain's A/AAAA DNS records are properly pointed to
# this machine's public IP, then replace ":80" below with your
# domain name.

:80 {
# Set this path to your site's directory.
root * /usr/share/caddy

# Enable the static file server.
file_server

# Another common task is to set up a reverse proxy:
# reverse_proxy localhost:8080

# Or serve a PHP site through php-fpm:
# php_fastcgi localhost:9000
log {
output file /var/log/access.log
}

}

# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile

Caddy uses request matchers to allow you to search for content in the information sent to the server, and we will use “header_regexp” this time like this:

@redirected header_regexp User-Agent (?i)(googlebot|petalbot)

handle @redirected {
redir http://0.0.0.0 permanent
}

What we are telling the server is to save the search result of the regex “(?i)(googlebot|petalbot)” in the named matcher named “redirected” and then create a handle for this matcher.

The handle executes the permanent “redir” redirect (code 301) if the words are found in the received “User-Agent”.

Now let’s add these lines to our file, it would look like this:

# The Caddyfile is an easy way to configure your Caddy web server.
#
# Unless the file starts with a global options block, the first
# uncommented line is always the address of your site.
#
# To use your own domain name (with automatic HTTPS), first make
# sure your domain's A/AAAA DNS records are properly pointed to
# this machine's public IP, then replace ":80" below with your
# domain name.

:80 {
# Set this path to your site's directory.
root * /usr/share/caddy

#User-Agent redirect rule here!
@redirected header_regexp User-Agent (?i)(googlebot|petalbot)

handle @redirected {
redir http://0.0.0.0 permanent
}

# Enable the static file server.
file_server

# Another common task is to set up a reverse proxy:
# reverse_proxy localhost:8080

# Or serve a PHP site through php-fpm:
# php_fastcgi localhost:9000
log {
output file /var/log/access.log
}

}

# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile

Add the text just below the line that determines the file path for your website and don’t forget to restart Caddy.

And we are ready, you can already apply this type of rules on your servers.

Testing

To perform some tests you can use our old and reliable curl like this:

> curl http://localhost:8080 -H "User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Remember to change localhost:8080 to the address of your server and the header to the one you are trying to redirect.

In the case of Nginx and Apache you will receive a response from the server indicating that you were redirected, Caddy will do its job silently and will not tell you anything, you will have to check the request log.

And that’s it, I hope this information serves you and do not forget to visit us at Winkhosting.co, we are much more than Hosting.

--

--