Limiting requests of bots/attackers/malicious users

Published in

Google Cloud - Community

6 min readSep 10, 2024

In today’s digital landscape, websites face a constant barrage of traffic from both human and non-human sources. While human traffic is essential for generating revenue and engagement, excessive non-human traffic can pose significant security risks, performance issues, and financial losses. Though, there are many ways to identify Bots, in this post we will talk about the easiest and most commonly used method, IP list.

Different providers does provide different IP list based on their reputation. Google provides Threat Intelligence, which identifies the IP addresses of requester as per the categories like TOR Exit nodes, Known Malicious IP address, Search engines, Anonymous Proxies, Crypto Miners and few Public Cloud Provider’s IP ranges. But what if you want to add more sources based on opensource and publicly available data. This may add another layers of security and protect your application from unwanted traffic. Using Cloud Armor address groups, you can create your own “IP reputation” list based on your use case, and use it effectively.

On a regular day, these requests may contain few genuine users as well, but when attack happens, most of the time you will find that attacks are originating from one/many of these IPs. They might be part of botnet, compromised hosts, or attacker trying to hide the real identity/source of the attack. Lets discuss few of the most common and publicly available sources of such list of IPs.

TOR Exit nodes

These are the IPs which are acting as TOR relay nodes, and your origin will most likely to get the requests from these IPs. The list is updated frequently, and you may consider outing the feed into your automation to update your IP list.

Spamhaus List

Spamhaus is a non-profit organization that tracks spam and related cyber threats such as malware, phishing, DDoS attacks. It provides real-time threat intelligence and maintains various IP lists. The most useful for our case would be their Do Not Peer List. You should consider putting these IP under your watch/block list.

There is another IP list available by Emergingthreats, which contains the list shared above and also the Dshield top attackers. Hence, this single IP list source will serve good with a combined list

Maxmind List

Maxmind has a good collection of IP based on their classifications. Most of the lists, like known VPNs, are paid ones. However, there are some free list also available. Check this list and make sure you keep you blocklist updated with these IPs.

What about some open source projects?

Indeed, I found some really good projects available on Github, which provide very good list of IPs, updated regularly, with source available. I am sharing some of these really good sources below.

https://github.com/stamparm/maltrail. They have a frequently updated list https://github.com/stamparm/maltrail/blob/master/misc/worst_asns.txt
https://github.com/stamparm/ipsum. They have an amazing collection of regularly update bad IPs https://raw.githubusercontent.com/stamparm/ipsum/master/ipsum.txt
https://github.com/MISP/misp-warninglists/tree/main. This project has a huge list of various related things. But we will majorly focus on VPN list.

Lets decide on VPN traffic

Allowing or denying the VPN traffic varies as per your business need. However, in case you want to put a watch or deny the VPN traffic, then MISP project has a decent list of VPN IPs. They have a list of almost 30,000 IPV4 IPs and 1200+ IPV6 IPs.

Cloud providers sending you requests

Cloud Providers representation image by Gemini

Think about your end users. Why would someone initiate request from a server in a Cloud Hosting Datacenter? Chances are that most likely it is some API or Bot traffic. It might be scraping your content, or it is also a possibility that some users might be behind a corporate firewall, which is running that exit node from some cloud. However, it will be a good idea to rate limit such requests, id not completely block them. You should create a list of allowed IPs for the genuine requests of your own APIs, in case you are running in Cloud, and API requests are coming through Internet. Some of the Cloud providers simply publish the IP list openly, while others put it behind some auth and ask you to hit their REST APIs (But why? :-(). Below is the list of some of the providers’ IP lists

Hosting Providers

Cloud is growing segment, with stricter norms on the content hosted. However, a large number of users are still using these Datacenters. And since large compute/bandwidth is available at these sources, and some even allow to host such malicious content. it is important that you should be careful with requests coming from these IPS, specially these IPs. Ask this to yourself, why the requests are coming from a datacenter server hosted somewhere in Germany, or Russia, or China? You will realize a that a substantial numbers of BOT traffic will be initiated by these IP addresses, and with a valid user agent string. Ask yourself again, why is someone is trying to spoof their identity? Hence, keeping a check on such IPs are necessary. Now, the list would be really huge, and ever changing. So typically, I keep track on the list of IPs crossing the rate limit rules, and then putting a stricter check on the entire list of their IP on the ASN number of that IP (It can easily be automated as well. I found 1 useful Github library for this). Most of these rogue IPs are already part of the list I shared above.

CDN Traffic

Global CDN POPs representation by Gemini

So far, we have tried to block those IPs, which might attack your resources. But I have seen numerous cases, where someone will try to pirate your content, but putting your resource behind their CDN. Hence, I is a good idea to proactively block those CDN IPs which you are not using. Great thing is that, these IPs are publicly available most of the time.

Akamai IPs
CloudFront IPs
CloudFlare IPs [Must act on this, if you are not using CloudFlare].
Bunny CDN (IPV4), and Bunny CDN(IPV6)
Edgio IP list is behind API call, and not available readily (Why Edgio, why :-( ?)

Geo Block

Before we end this discussion. lets talk about the easiest and most commonly used tool — The Geo block. Sometime, it is difficult to predict where you will get your target audience from. But most of the time, you absolutely know that certain country is not where your target audience is (and you still see large number of hits from there). Hence, It would be a good idea to put certain countries in Block list.

Good Bots

It is possible that while blocking so many IPs, you end up blocking some of the IPs which did not intend to bock, and they were your actual SEO bots. Hence, you should consider putting an allow list for such IPs. I found 1 useful Github repository, which exactly does this thing, provides you the list of “good” bots. The Google Bots IP ranges are already publicly available.

If you consider all the above mentioned points, and keep a tab on the IP lists mentioned, you would be able to substantially reduce unwanted traffic on your application and save cost, improve availability, and optimize performance by keeping the compute cycles limited to genuine requests. Hope this was a useful post for you. Feel free to suggest improvements, or in case you want to share any improvements.