How to trap bots in your own honeypot

Emil Shain
Immue
Published in
5 min readMay 19, 2022
Photo by Stillness InMotion on Unsplash

A honeypot is great for diverting malicious bots away from your website. It not only serves as an excellent decoy for these bad actors, but it can also be used to detect, gather intelligence on, and study their identity and hacking methods — so you can be better positioned to stop them from doing damage.

At Immue, we wanted to put a honeypot to the test and see what kind of insights we can gather and share.

In today’s post we discuss exactly how we set up a honeypot with NodeJS, and present insights about the profiles of the requests, their sources, and the top user agents, along with the actual code we used.

Setting up the honeypot

First, the objective — we wanted to find network bots as they are continually scanning the internet, and are much easier to catch, as opposed to the web bots that use real browsers and download static files, such as images, JavaScript files, and others.

So, we created the honeypot and published it to one of our servers. We logged every request that was made to our own Immue website. And we flagged all the requests that didn’t load a static file, as this was a clear indication that these requests were likely coming from a network bot.

The monitoring approach we used was to look for /.favicon requests to our website, since these are likely to be cached. As such, providing a fake image as a decoy for caching is a great way to dupe the bots and lure them into the honeypot.

Profiling the requests

We ran the honeypot for two weeks and found that the requests could be categorized as follows:

  • 340 unique user agents
  • 1447 unique URL requests
  • 1033 unique hostnames
  • 494 unique company names
  • 356 unique ASNs
  • 66 unique countries

What’s interesting to note about these numbers is that they show how great the risk of attack is — for any website, including ours (and yours). We see that there are as many as thousands of servers around the world continually scanning the internet. And they’ve become very good at not only finding vulnerabilities, but also at launching bad bots to exploit them.

Identifying the top paths

With the honeypot we set up, we indeed detected network bots that came to our server looking for vulnerabilities.

The top paths that they used to send their request were:

Top 10 path requests from our honeypot

As we can see, there were 2,833 requests made to the /.env path, which also happens to contain the most sensitive data of our application.

As a result of this insight, we now know that we need to be very careful about exposing the /.env file on our website which commonly contains sensitive information.

Detecting the most common source

Moreover, we also found that most of the network bot requests came from data centers, as we can see in the table below which lists the top 10 ASN names gathered by our honeypot:

Top 10 ASN names from our honeypot

This was an important discovery, since it empowers us to be more effective in our protection. For, if we know that most malicious requests are coming in from servers located in data centers, we can be more vigilant and proactive in blocking traffic coming in from those sources.

Identifying the top user agents

Moreover, identifying the top user agents also proved to also be very powerful for improving protection:

Top 10 user agents from our honeypot

As we can see from the above table, some of the network crawlers identified themselves as bots in the user agent. Knowing this helps us because we can now block them through user agent parsing.

However, it’s also important to note that some of the crawlers actually provide fake browser user agents, while some don’t provide a user agent at all — as we can see in the first line of the table, where the bot identified itself as an empty user agent, which is never used by human users and only by bots.

So, what we can do is block them not necessarily through their self-identification, but by filtering out visitors as based on their IP address, or the data center they’re coming from.

Accordingly, by gathering data from the honeypot we can determine which firewall rules we need to generate to block these bots.

The code

As for the code we used to set up the honeypot, first we created the NodeJS server using Express as follows:

Then, we sent a response that includes an image, so we could identify the network bots which were not loading static files:

We wrote a simple function to create text for a file which was then used to monitor our site’s visitor IP, user-agent, and URL:

We set up middleware to respond to every request made to our site:

Each request was saved in two files:

1. a file called “log.txt” which contains the IP Address and user-agent of the visitor.
2. a file called “urls.txt” which contains the request method and request URI

We also added a final step for removing legitimate web visitors from our honeypot log file:

The following code sums it all up:

See it for yourself

If you run this code and send a request to the server through a web browser, you should be able to see the following output in the console:

::1 GET / Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36
Whitelisted Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36

If, however, you make the request using cURL, you will see this output:

::1 GET / curl/7.77.0

And you will also see that the cURL user-agent and IP address are saved to the log.txt file.

In conclusion

Now it’s your turn. We encourage you to use this code and publish it to a hosted server. This way, you’ll be able to whitelist legitimate web browser requests, while detecting the malicious attempts of bad network bots.

You’ll probably be surprised to see just how many of these bad bots are actually reaching your server.

If you save the malicious IPs to your database or firewall you can then block these bots from doing any damage the next time they come around, and protect your website against attacks.

And, if you’re looking to take your website protection to the next level, and even stop the bots that no one knows exist yet, we invite you to reach out to us at info@immue.net.

For all the insights and code, we also invite you to visit the Immue Github.

--

--