EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Bot Attacks: You are not alone…

An introduction to bots and mitigating risk

Fabian Piau
Expedia Group Technology

--

A photo of a person sitting at an airport gate. Taxiing airplanes, jetways, and pavement can be seen through the window behind them.

In a recent article, Security Magazine stated that 1.3 billion bot attacks were detected in Q3 2020. It’s not a surprise. We are not alone. You are not alone.

Last year, the Opex Excellence team at Expedia Group did a review of the incidents in production that impacted the Hotels.com website. We covered a 12-month period, and bot attacks are one of the most important threats we identified.

Anything that can go wrong will go wrong — Murphy‘s law

In this article, I will define and explain what bots and bot attacks are. I hope it will help you in building up some knowledge and getting a better understanding of the threat your website could be facing, so you can come up with a set of strategies and solutions to prevent and mitigate a future bot attack.

Know the enemy and know yourself, in a hundred battles you will never be in peril — Sun Tzu, The Art of War

Little toy robots, collection of blue and pink colored
Photo by Eric Krull on Unsplash

What are bot attacks?

The first type of bot attack that usually comes to mind are Denial Of Service attacks (DoS) or DDoS (when it’s distributed). A DDoS attack usually impacts the whole website, with failures happening in cascade, making it unavailable. The attacker sends a large number of requests (e.g. perform multiple GET/POST on random URLs) in order to overload the underlying services so much that they cannot handle the traffic anymore. Services will start to respond to requests with timeout or error responses. This obviously also affects the requests originating from legitimate customers — customers that are trying to book their hotel stay, in our case.

Another type of attack, slightly more targeted, are Scraping attacks. A crawler bot can look into various pages in order to extract specific pieces of information. This is also known as Data scraping. They usually target your inventory data, e.g. all the properties details and contact information, or pricing details for specific dates. Sometimes it could even be the whole contents of some pages of your website, in order to replicate them and prepare a phishing attack.

ℹ️ Ideally, the attacker does not want to be seen while scraping a website and will try to span the attack over multiple days to avoid a surge in the traffic that may trigger some alert and investigation on your side. Unless the attack is sophisticated and coming from different machines, you can usually figure out that the website is being scraped when you start to see an abnormal number of requests coming from the same IP address (or range of IPs) at the Edge level. I will give more details about the Edge level in the next part.

If DoS and Scraping attacks are quite general in scope, in the sense they are targeting various pages across the website, we also see attacks against specific pages.

Specialized attacks are much more narrow in scope, and they usually target a specific page or feature of your website. Below are a few real cases that we have been faced with:

  • An attack on the Booking page, when the attacker tries to break the coupon field by generating and trying multiple codes, hoping there is an exploitable pattern to the codes. This is using brute force.
  • An attack on the Sign in page, when the attacker tries multiple login and password combinations . The idea is to get access to user accounts and confidential data, or what we call Account Take Over (ATO). This is also using brute force.
  • An attack on the Mobile App page, when the attacker tries to send large numbers of SMS to random or fake phone numbers. It is quite typical nowadays to prompt users to download an app to their mobile device by sending an SMS with the link to the app on the store. Send out a flood of such messages, and if enough people follow the link, your app page on the app store can be overloaded, or your app can lose reputation for spamming. Attackers who rely on a third party vendor for this messaging service will end up paying money.

General or specific, an attack can be basic or sophisticated. An example of a basic attack is someone trying desperately to find a discount code firing dozen of requests to a coupon service. Basic attacks are difficult to spot but easy to block: you can block the IP address with a WAF (Web Application Firewall) rule. They also have a low risk profile.

On the other hand, with sophisticated attacks, they can be distributed and spanned over thousands of machines located in different countries and using advanced scripting technologies, like headless browser. The level of impact and risk is much higher but they are obviously easier to spot when associated with a traffic surge.

Bot attack risk matrix. The vertical axis is scope, which varies from specific to general. The horizontal axis is complexity, which varies from basic to sophisticated. Specific and basic attacks are shaded green for ease of handling. General and sophisticated attacks are shaded red because they are harder to handle.
Bot attack risk matrix

Should we block every bot?

Definitely not! Not all bots are evil; some are even beneficial to us.

  • There are the Spider bots (or Crawlers) from popular search engines like Google Search or Microsoft Bing. If we block those, the indexing of the website will be badly impacted, the legitimate traffic will degrade over time, and the website popularity will go down.
  • There are also the Commercial bots (e.g. Google AdSense bot), to provide personalized ads to the users, including the ads for our own website.
  • There are the Data bots like content aggregators and feeds, and there is also the Archive bot that is building the biggest internet Archive.
  • There are the Copyright bots, that look for plagiarism or intellectual property theft. You may have faced one of them if you’ve tried to upload a video on Youtube with your favourite music in background. You certainly quickly realize Google took it down, reminding you politely that you cannot use any protected material.
  • There are also the Monitoring bots to make sure your website is healthy, and raise some alert in case it’s not. For example, Akamai, Datadog, and so on use bots to make sure your site is responding properly.

ℹ️ You should not block any of these bots because they are not bad, and usually they are not aggressive, plus they contribute to the Internet. If you feel your legitimate traffic is suffering from them, then instead of blocking, it’s best to have a tarpitting or rate limiting strategy in place to mitigate their impact. More details about this in the next part about the Edge level.

If all these bots are third party, sometimes you can have your own. In our case, an internal bot is regularly checking our landing pages to make sure there are no dead links or unnecessary redirections. So we definitely don’t want to block it!

Decorative separator

What can we do?

We can prevent attacks at the Edge level and mitigate them at the Application level.

Edge and Application levels diagram: The Edge level sits between the Internet and your application. Traffic must be passed through the Edge to get to the Application.
Edge and Application levels

Use of robots.txt

The first thing that comes to mind when dealing with bots is the robots.txt file. Every website has it and it has been there for ages. This is a text file accessible publicly at the root of your website. It specifies the rules for any bots accessing your site. These rules define which pages the bots can and can’t crawl, and which links they should and shouldn’t follow.

Good bots will follow these rules. For instance, if a website owner doesn’t want a certain page on their site to show up in Google search results, they can write a rule for it, and Google web crawler bots won’t index that page. Although the robots.txt file cannot actually enforce these rules, good bots are programmed to look for that file and follow the rules before they do anything else. It’s based on a code of honor.

Malicious and bad bots will obviously not follow any of your rules. On the opposite, they will often read it to learn what content a website is trying to keep off-limits from them, then access that content. Thus, managing bots requires a more active approach than simply defining the rules for bot behavior in the robots.txt file. This is what we are going to see in the next part.

ℹ️ The robots.txt file can also be used to set up a ‘honeypot’. A honeypot is a fake target for bad actors that, when accessed, exposes the bad actor as malicious. In the case of a bot, a honeypot could be a page on the site that’s forbidden to bots by the robots.txt file. Good bots will read the robots.txt file and avoid that page, some bad bots will crawl the page. By tracking the information of the bots that access the honeypot, bad bots can be identified and blocked. Source: Cloudflare

Advanced Bot Management

The main shield against bad bots is Bot Management at the Edge level. This is much more advanced than the robot.txt file. I took this list from Cloudflare but it will be a similar set of features for any other Edge tool:

  • Identify bots vs. human visitors (using behavioral analysis and potentially machine learning)
  • Identify bot reputation
  • Identify bot origin IP addresses and block based on IP reputation
  • Analyze bot behavior
  • Add good bots to allowlists
  • Add bad bots to blocklists
  • Challenge potential bots via a CAPTCHA test, JavaScript injection, or other methods
  • Rate limit any potential bot over-using a service
  • Tarpit recognized bot requests (see definition below)
  • Deny access to certain content or resources for bad bots
  • Serve alternative/cached content to bots

ℹ️ ‘Tarpitting’ is an interesting feature. It means to add an artificial delay to the request. It is usually much better than blocking because the bot won’t know it has been discovered, but the attack will slow down significantly as fewer requests will reach the Application level, as they could time out at the Edge level. Rate limiting can also be another good strategy you may want to look at.

When deciding about an Advanced Bot manager, you can use a popular third party provider like Akamai or Cloudflare.

Pros

  • No impact on the application code.
  • A bot rule is relatively quick to deploy with immediate effect.

Cons

  • Most of the cons lie in the fact they are third party.
  • There is a license cost.
  • Bot rules can only be defined against generic and non business parameters like user agent, IP, endpoint, etc.
  • They sometimes involve a heavy approval process, e.g. adding a new rule will require people outside of the company plus internal people with special authorization.
  • A new rule can have side effects. Unless blocking a unique IP address, it’s very hard to be sure it won’t prevent some legitimate traffic from getting in.
  • The maintenance of the rules can be cumbersome over time.
  • Partial access and visibility for application teams.

Most of the disadvantages can be mitigated if you are using your own in-house Edge tool. This is particularly interesting when used in addition to a third party Edge tool. It’s obviously not something every company can invest in, but it will give you much more flexibility.

  • Ability to set rules related to your business.
  • Ability to add some one-off temporary rule to mitigate an attack that you can delete shortly after the attacks passed.
  • Ability to centralize the Edge monitoring and make the information available to every team.

Traffic prioritization

The idea here is not about replacing your main Edge tool but to add some bot logic after it. In a nutshell, such a tool will act as a prioritization queue so low value bot requests are deprioritized in favor of real user requests that have higher business value, e.g. ending up with potential booking in our case.

User requests > Internal bot requests > External good bot requests

ℹ️ Netflix is applying some similar concepts that you can read in Keeping Netflix Reliable Using Prioritized Load Shedding.

Caching

We talked about Bot Management but there is something else that the Edge layer can provide for you, and that is the ability to cache content. We usually refer this to a Content Delivery Network (CDN). The idea is to serve cached pages to good bots rather than generate fresh pages.

We did a Proof of Concept last year and it significantly decreased the traffic to our Landing services without affecting the SEO of the website. This year, we are looking at generalizing this approach.

Decorative separator

Mitigating at the Application level

Managing bots at the application level means that a bot attack was able to pass through the higher level of protection on the Edge.

The solutions we have at this level are only mitigation solutions in order to be proactive and reduce the burden of an attack.

On a sad note, we know it’s going to happen, and even multiple times, so we need to prepare for it. On a brighter note, what we know is that bot attacks do not last forever. A sophisticated attack costs a significant amount of resources for the attacker, and as resources cost money. As long as we make the attacker spend more value than it is getting, we are certain the attack will be stopped eventually as a bad investment.

There are different actions we can do and most of them are good practices.

  • First is when coding the application logic, we can avoid high complexity code, blocking threads, and so on. A good idea is to separate application and management ports. If you use the same port, in case of a bot attack the service will be so overloaded that it won’t be able to respond to health checks and your infrastructure platform will flag it as unhealthy. Even if it does not solve all your issues, having a separate port can mitigate this.
  • Having a chaos mindset is important. For critical services, make sure you have load and stress testing in place in your pipeline. This is to ensure your services are resilient enough, and that you have identified potential memory leaks in your code and bottlenecks reaching downstream services or data sources. In case something goes wrong, you still want to serve a degraded response to mitigate the impact to the customer. You could also have some caching mechanism in place.
  • Make sure you leverage your infrastructure. If you use Kubernetes, you can take a look at auto-scaling. Be vigilant when enabling it. Ensure the configuration is well thought out and in line with your dependencies. Setting up a high number of pods and consider it as done will be a mistake, as you will also share the load with your downstream dependencies and, if they are not prepared for it, you will basically shift the bottleneck deeper in the stack without solving it. It may also cost you more money, if your infrastructure is hosted on a Cloud provider like AWS. Also make sure your pods are ready to take traffic once they are exposed to the attack. A warm up routine like Mittens will support you, especially for applications that are slow to start up.

There are also other strategies at the application level that are not related to configuration and infrastructure. Some mimic a bot management solution at the Edge level:

  • Captcha mechanism. A common way is to display a captcha to the user if there are too many attempts, or to target account-related attacks.
  • Authentication mechanism. If your APIs are public, you may want to add some authentication, from ‘Basic Auth’ to ‘CSRF Token’. But you should be aware it will add some complexity to your system, and you have to balance it with the information your API provides, e.g. ask yourself if the content exposed is sensitive enough.
  • Caching, Blocking, and Rate limiting mechanism. This may be quite complex to achieve and maintain, especially in a micro-service architecture, but I prefer to mention it because it could be a potential solution if you don’t have any Edge tool or if you are working on a monolithic app.
Decorative separator

Observability, Monitoring and Alerting

Last but not least, it’s very important that you have proper observability in place. Any bot attacks that start to get heavy and put high pressure on your system should trigger a set of alerts automatically.

At the Edge level:

  • Alert on the bot rules when a rule created recently is blocking too much traffic or, on the opposite, when a rule has not blocked any traffic for the last ‘n’ months and can be reviewed for potential deletion.
  • Alert when the bot traffic (good or bad) is much higher than usual.

At the Application and Infrastructure level:

  • Alert on auto-scaling and number of instances. E.g. when Kubernetes is spinning up 5 new pods in the last 5 minutes and it’s not Black Friday, there is probably something fishy…
  • Alert on response time and status when the service starts to respond slowly and/or with errors. I recommend you to read Creating Monitoring Dashboards, which covers all you need to know about monitoring.
  • You can also set up some alerts at the log levels. This can be useful if you are missing some metrics in your application, and in case you are using an advanced log management tool like Splunk.
Decorative separator
Antique toy robot waving arm
Photo by Rock’n Roll Monkey on Unsplash

I hope this article was useful and you now have better knowledge of bots and bot attacks.

I did not discuss any silver bullet tool here, because there is no such thing as a single, perfect anti-bot tool for everyone. But there is always room for improvement to prevent and mitigate bot attacks.

Ah… And good luck for the next one! 👋 🤖

Learn more about technology at Expedia Group

--

--