Frankenbot by Ben Bely

Detecting human users: Is there a way to block enumeration, fuzz or web scan?

No, you won’t be able to totally block them, but you would be surprised how stupid some bots are! Nginx + Lua FTW.

TL;DR: I started to write this article in order to go deep on all possible way to block web scan and enumeration activities. I realized that it’s not so easy, and there are many approaches but no one seems to work at 100%. What I’ve learned is that, in many cases, JavaScript could be the right track to follow. In this article, I’ll show you how I’ve blocked Nikto and wfuzz using JavaScript.

a dangerous botnet in action

If you have ever used dirbuster, wfuzz, nikto, wpscan, skipfish, etc… you know that each of them makes a lot of noise on the target website, and their behavior can be recognized and blocked in many ways. The problem is that all pattern match, or most of them, can be easily eluded (and if you have ever tried to block the shodan crawler, you know well what I mean). So, is there a way to block this kind of activities? Can I detect bots, crawlers, and non-human users? The answer, my friend, is: no. I mean, this is not possible in 100% of cases… but you would, however, be surprised how stupid some bot are!

Let’s just get this over with: the easiest (and totally useless) thing that you can do, is to use a filter based on the User-Agent header value. Something like “if user-agent contains nikto then block”… oh man, you’re just wasting your time. Wait: It doesn’t mean that this kind of filter will never be triggered. If you use the OWASP ModSecurity Core Rule Set, you know well that rules included in REQUEST-913-SCANNER-DETECTION.conf are often triggered. But this because there’re many script kiddies out there which ignore that, in the real life, they need to change their Nikto’s user-agent before doing information gathering.

reCAPTCHA, IP blacklist and trap

I’ve read many articles and tutorial that try to intercept bots using “trap”, but some of them doesn’t seem good ideas, In many cases they’re useless like: “disallow a path in robots.txt and if something will request it, ban it forever”. Ok, this could work for tools that looks for resources inside robots.txt (like Nikto), but not all uses this techniques (for example dirbuster for enumeration, or wfuzz for fuzzing).

One of the best ways to detect if a user is human or not is to use reCAPTCHA (Obviously Google can help us with his reCAPTCHA API that you can find here: https://www.google.com/recaptcha), but I hate the idea to force all users to prove that they are human each time they want to visit my website or use my web application (even if Google now has “transparent captcha” that is less blocking, but it isn’t always a good choice for the “user experience”). The bad news is that if you google it a little bit, you can find some ways to bypass it… ask to Google how to bypass Google.

There’re many services too that can help you to protect your website from botnet and scan. For example, CloudFlare uses two different approaches: the first one is a reCAPTCHA (like Google does) it could annoying users but it works very well. The second is automatic (I mean “not interactive”) and it doesn’t require any interaction with the user. It’s more comfortable for users but less secure… take a look a this https://github.com/Anorov/cloudflare-scrape that is a simple Python module to bypass Cloudflare’s anti-bot page (also known as “I’m Under Attack Mode”, or IUAM), implemented with Requests.

I need something that is totally automated which is difficult for bots to do but less intrusive for users. There’s a specific thing that nikto, wfuzz, dirbuster, etc… can’t do: JavaScript execution.

Ok I know that a non-interactive check based on JavaScript will not block at 100% all automatic requests, but I don’t want a perfect world. I just want to block automatic scans and fuzzing activities on my website, without using third-party software for more integration with my logging system or my Web Application Firewall. And this is possible! let’s see how.

Theoretical explanation

Don’t panic, I’ll use a few lines of Lua directly inside the nginx.conf file. Trust me, it’s easier than it seems. My idea consists in three steps that a user must pass before reach my website. It should be totally transparent for the user but not for non-human user-agent:

  1. Users, using their browsers, make the first request to my website without any session cookie. So, Nginx detects that there isn’t any session cookie and give back to users an HTML page with a JavaScript that users need to execute in order to get their session token code.
  2. By executing the JavaScript, users get their session cookie from a JavaScript function (something like document.cookie = "iamhuman=sessiontoken";). The token will be an encrypted string with a short expiration time of just 10 or 20 seconds.
  3. Now users are redirected (via JavaScript with something like location.reload();) to the website, but at this time they have got the session cookie. Nginx will check if token inside the cookie is valid, and if yes, it’ll proxy the user request to my “real” website.

As you can see in the schema above, the first step that a bot or a crawler has to pass is the inclusion of botbuster.js in the response HTML. This first step would be enough in many cases to disarm tools like nikto or dirbuster. In order to make more difficult (for bots, crawlers or tools) to request the botbuster.js script, I’ve added an encrypted token that contains user’s IP address, user-agent, and a timestamp. If this token is valid, the script will be returned. Else the user receives an error message.

In the second step, the user needs to execute the JavaScript code inside botbuster.js in order to get a valid session cookie that will make him able to reach my website. This session cookie will be something like iamhuman=tokenblabla12345.... This token is an encrypted string that will be valid for the next 10 seconds. When the token expires, after 10 seconds, the user will be automatically redirected to the first step.

In the third step, the user requested page will be reloaded by JavaScript. Now this request should have the iamhuman session cookie that being validated by Nginx. If valid, Nginx does a proxy_pass to the local 127.0.0.1:8000 where my website lives.

Easy, isn’t it? Obviously, all this could be bypassed just by creating a custom script that parses the JavaScript syntax in order to request /botbuster.js and get the session cookie before doing scan or enumeration. This is more easy then it seems, but trust me: Nikto, dirbuster, gobuster, wfuzz, wpscan, and so on… will be totally useless in front of this because they will not parse and execute this JavaScript code, and their session will no longer valid even if we set it manually.

My result:

Test using Nikto with and without human recognition

As you can see, in the top screen I’ve done a test using Nikto against my website without the “human user recognition” system. Nikto does a good job finding test.html and phpmyadmin directory! In the bottom screen, contrariwise, I’ve done the same test but with the “human user recognition” activated. In this case, Nikto doesn’t find any file or directory because it was unable to use a valid session token. Cool, isn’t it? 😎

Less talk, more code. Let’s do it!

Nginx + Lua FTW!

As usual I’ll use Nginx, more specifically OpenResty that integrates the standard Nginx core, LuaJIT, many carefully written Lua libraries, lots of high quality 3rd-party Nginx modules, and most of their external dependencies. Lua module embeds Lua, via the standard Lua 5.1 interpreter or LuaJIT 2.0/2.1, into Nginx and by leveraging Nginx’s subrequests, allows the integration of the powerful Lua threads (Lua coroutines) into the Nginx event model.

The second Nginx module that we need is “encrypted session” developed by Yichun “agentzh” Zhang (章亦春) and included in OpenResty. This module provides encryption and decryption support for Nginx variables based on AES-256 with Mac, and is usually used with the ngx_set_misc module and the standard rewrite module’s directives.

First, we need to configure a location for our website or web application. This location must be reachable just from local system, something like:

server {                                                                                                                                           
listen 127.0.0.1:8888;
server_name example.com;

location / {
root html;
index index.html;
}
}

Now the public server. First, we need to configure the encrypted session. There are three parameters: encrypted_session_key that sets the key for the cipher (must be 32 bytes long), encrypted_session_iv that sets the initial vector used for the cipher (must be no longer than 16 bytes), and encrypted_session_expires that sets expiration time difference (in seconds by default). Here my configuration:

server {
listen 80;
server_name example.com;
   encrypted_session_key 'v1-clG~!~v7B_Z0yu.:iw*Rj#l-Nc8E^';
encrypted_session_iv "themiddlerfvbgt5";
encrypted_session_expires 20;
   ...

The whole Nginx configuration on port 80 should be something like this:

Ok, i need to explain a little bit here 🤓 First Nginx check if user requests a page using the iamhuman session cookie. If it’s not present, it print out this response body:

<html>
<head>
<script type='text/javascript'>
(function() { var as = document.createElement('script'); as.type = 'text/javascript'; as.async = true; as.src = '/botbuster.js?t=ri9evnme608ccochuglog55co6p6vgugr00vnk0d5022ctpvl1a0c6rfvmi9jpepdhiocjo8duj9o6e0e5r0c752aeebm1tieda2e5t7penvq88acltou4gfo7djn3kc1mqjr3s44667f9pph2lagau5g8======'; var s = document.getElementsByTagName('script')[0];s.parentNode.insertBefore(as, s); })();
</script>
</head>
<body>
checking if you are a human...
<script>
setTimeout(function() { location.reload(); }, 1000);
</script>
</body>
</html>

I try to make more readable the first JavaScript here:

(function() {
var as = document.createElement('script');
as.type = 'text/javascript';
as.async = true;
as.src = '/botbuster.js?t=...longtoken...';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(as, s);
})();

As you can see, this script dynamically creates a new script tag that includes the botbuster.js script with a dynamic token on the t parameter. This token is generated using the set_encrypt_session function:

set_encrypt_session $token "ts=$time_iso8601, src=$remote_addr, ua=$http_user_agent";
set_encode_base32 $token;

The first line encrypt a text string that contains: a timestamp, the user’s IP Address and the user’s User-Agent. Then, the second line, encode the encrypted string in base32.

Now, the user should request botbuster.js using this token, that will be decrypted by Nginx and checked if IP Address and User-Agent match. If they match, botbuster.js will use the document.cookie JavaScript function in order to send to the user a session cookie, that contains an encrypted token (the same string with the user’s IP Address and User-Agent). Once user receive his session cookie, the requested page will be reloaded using location.reload() JavaScript function.

Now Nginx receive the same first request but with our session cookie. It’ll decrypt the token contained in this cookie and check if all parameters match. If yes, it’ll proxy_pass the user’s request to 127.0.0.1:8888 where my website live.

A hypothetical scenario could be something like this:

Common HTTP requests from browsers able to execute JavaScript

And for all request made by an user that is unable to execute JavaScript, it should looks like this:

automated requests, bot, crawler, etc…

I’ve decreased the session expiration time and increased the JavaScript reload timeout in order to better view the results of the following example made by using a browser:

Human recognition in action using a browser

Test 2: wfuzz

The first test that you can see in this video, consists of run wfuzz against my Nginx without any kind of blocking mechanism. As you can see, wfuzz find index.html and test.html files. Contrariwise, when I enable the right location, wfuzz is no more able to discover any file!

Conclusion

I think that this method is not indicated for the whole website, but you could use it if you have a form, or a restricted / authentication area that you want to protect from enumeration or scan activities.

Please, keep in mind that all methods that use JavaScript to block bots and crawlers are not secure, and they don’t work at 100% of cases. They could help to mitigate the incredible amount of automated attacks and scan, but they can’t protect your website from a custom automatism that parses your JavaScript in order to bypass it. Remember: don’t trust vendors who say that their service blocks 100% of bot’s activities… they’re simply lying you 😉

-theMiddle

Links

A beautiful answer on stackoverflow https://stackoverflow.com/a/7154667

CloudFlare scrape on GitHub https://github.com/Anorov/cloudflare-scrape

You can find a lot of stuff here https://duckduckgo.com/?q=detection+bot+javascript

Ask Google how to bypass Google https://www.google.it/search?q=how+bypass+google+recaptcha

Contacts

Andrea (theMiddle) Menin
Twitter: https://twitter.com/Menin_TheMiddle
GitHub: https://github.com/theMiddleBlue
Linkedin: https://www.linkedin.com/in/andreamenin/

Sponsor | Looking for a remote browser isolation solution? Check out WEBGAP, home of WEBGAP browser isolation and the WEBGAP remote browsing service.