Fighting Back Against DDoS Attacks
Engineering resilience into your applications and services
If you search for DDoS (Distributed Denial of Service) attacks online, you will be surprised at how common they have become. 2018 saw the largest DDoS attack to date on Github at 1.35Tbps; fortunately, it only affected availability of their service for around ten minutes. The Danish Railway was less fortunate when a DDoS took out their ticketing system for two days, affecting around 15,000 commuters.
There are many companies with services and software designed to combat DDoS attacks and protect your site. However, relying solely on these companies is not enough. Especially against an application layer DDoS attack, where the requests coming in look just like your site’s normal traffic. I will show you five areas where software engineers can develop resiliency into their applications and services in order to better defend against an attack. These are:
- How to trust your traffic
- Rate limiting your requests
- Boundary checking your APIs
- The importance of an effective circuit breaker strategy
- Being quick to fix
What is a DDoS attack?
A Distributed Denial of Service (DDoS) is a targeted attack on a web site or device where a malicious flood of traffic is sent from multiple sources. Its goal is to overwhelm and degrade the site so that it becomes unusable. There are many different types of DDoS attacks. In this blog, we will focus on the application layer attack (HTTP flood). The attackers look to exploit weaknesses in your web pages and application programming interfaces (APIs) by hacking scripts or using readily available tools found on the internet. For an eCommerce site, the cost of such an attack can cost thousands or even millions of dollars. After they take down or degrade a site, some attackers will send an extortion email with a demand for a bitcoin payment to turn off the attack. The following diagram shows a world map of an actual DDoS attack that took place recently at Vrbo. The dots show the distributed nature of where the attack originated from, with the size of the dot representing the volume of the traffic being sent.
What can engineers do?
While network and security engineers are busy detecting and blocking anomalies at the network edge, they have the difficult job of keeping good traffic coming in. If you run an eCommerce site, you don’t want to accidentally block a potential customer from making a purchase on your site. That’s where software engineers can help out by building in additional layers of resilience into their applications and services. Expect that some amount of malicious traffic will pass through your edge layer into your application, service, and database layers. When it does, can you safely say that your site’s critical applications and services will withstand an attack? Here are five areas where engineers can build better resiliency into their software.
1. Trust the right traffic
Trusting your network traffic is something usually done at the edge layer or API gateway. When an attack occurs, your network and edge teams would like to block the offending traffic by Internet Protocol (IP) addresses and/or Autonomous System Numbers (ASN). But this can add significant business risk if you block good traffic with the bad. To combat this, you need to be able to map your network traffic against your business key performance indicators (KPIs). When you enrich your KPI events with the corresponding network traffic data, you can now score every ASN/IP combination against the business value it brings to your company. If a DDoS attacker is using an ASN/IP combination that you know has done little or no business with, then you can trust that blocking this address combination should not affect your business.
2. Rate-limit your requests
When under attack, you can elastically scale your system to handle the increased load. But what if you’re seeing 10x,100x or 1000x loads? At some point, it would be nice to be able to rate-limit incoming REST and Ajax requests, returning HTTP 429 (Too Many Requests) responses when under duress. There are many use cases for rate-limiting traffic, but when you are dealing with anonymous callers, this becomes difficult. Utilizing tokens (like JWT) is one way to either accept a request, or rate limit the request with an HTTP 429. Ultimately, rate limiting comes down to controlling high requests per second, but how can you measure your requests to manage the tokens? The rate-limiting strategy you choose is up to your individual use cases and the data available to you with the request. Some common ways to measure your rate limiting are by:
- Known vs anonymous users/clients
- Brute force requests with invalid identifiers or parameters
- Network identifiers such as IPs/ASNs
- Per client or at the account level
- By bot classification
This is not a foolproof system, but can be a good defensive tool to have at your disposal. Warning: Rate limiting can potentially affect your business, so it’s not necessary to keep it turned on at all times. Try utilizing runtime configs or even your A/B test framework to enable rate limiting quickly when under attack.
Finally, with the increasing popularity of front end API simplification technologies like GraphQL and Falcor, each client request can be magnified into N requests on the server. This highlights the importance of securing and rate-limiting these endpoints.
3. Boundary-check your APIs
Are you boundary-checking your APIs against potential poison pills? Numerical values, wildcards, and date ranges can be used to drive looping behaviors, and large queries in your applications and services. Relying solely on UIs to enforce field validation is not enough. A few years ago, an attacker bypassed our UI and sent an Integer.MAX_VALUE for a particular field, which equates to a value of 2,147,483,647. A valid business value for this functionality was from 1 to 100. This 2 billion integer was passed five layers deep into our internal services code, where it maxed out memory on multiple nodes and degraded part of the site to unusability. This ended up being a very simple code fix, but it highlights the importance of boundary-checking your internal APIs. Eggshell security for internal APIs is not enough to limit an attacker exploiting your system with meaningless values. Always adopt a mindset of defending an attack in layers.
4. Use an effective circuit breaker strategy
Having a circuit breaker strategy for resilience is often overlooked. In my past career as an electrician, I would avoid mixing electrical circuits for different use cases. For example, a refrigerator always had its own dedicated circuit, separated from general power outlets. In the event something on a general power outlet trips its circuit breaker, the fridge would not be affected and its contents would not spoil. Similar strategies can be used when dealing with circuit breakers like Hystrix on your services. Some things to consider include:
- What is the user experience when your services circuit breakers trip?
- Which other services are affected when the circuit breaker trips?
- What is your fall back or corrective action strategy?
- Is your circuit breaker fine-grained enough to limit the impact?
- Are you measuring how often your circuit breakers are tripping?
In a recent DDoS attack, a circuit breaker for a service which powered a critical API for an externally-connected customer tripped. Only one customer was targeted, but unfortunately, all partners were affected when the circuit breaker tripped. Having a more finely-grained circuit breaker strategy in this case would have resulted in a better user experience, limiting the blast radius.
5. Be quick to fix
I cannot stress the importance of having a modern CICD (continuous integration, continuous deployment) architecture to defend against DDoS attacks. When your site is under attack, only then might it become obvious that a quick patch is required to fight off the flood of requests. The last thing you want is a simple fix that’s going to take hours or days to deploy to production, forcing you to ride out the attack. I don’t think there is much more to say on this topic, other than stressing the importance of software that can be “quick to fix”.
It’s not a question of if, but rather when your web site will get hit by DDoS attack. It’s no longer ok to leave it solely up to your network and security teams to fend off the attack at the edge. It’s the responsibility of application and service engineering teams to build additional layers of resiliency into their architecture. Some areas where engineers can focus to lessen the impact on an attack include:
- Trust your traffic by measuring against business KPIs
- Rate limiting your requests when under attack
- Boundary check your APIs against poison pills
- Deploy an effective circuit breaker strategy
- Always be quick to fix
When engineers take responsibility for defending their architecture against a DDoS attack, they make any attack’s impact less effective. Remember that a layered defense is the best defense.