Building rate-limiter. Part 1. Theory.
Almost every application faces the task of filtering incoming requests.
There are two types of problem requests — DDoS attacks or too frequent requests from real service customers.
The DDoS problem can be solved by such monsters as CouldFare. They have an incredibly massive infrastructure, which helps to filter out spam requests effectively.
But they can’t help us filter traffic at the application level.
With this problem, I have encountered.
We have complex logic in the application that takes a long time to execute. Our client wrote a script and started sending thousands of requests per second. These requests caused our service to respond with delays to all of our clients.
This article will look at the theoretical part of building a rate limiter.
We will see what tools we can use to make it happen in the next one.
To better understand why we need a rate limiter, let’s look at a real-life example.
We have a parking lot with a capacity of 60 cars. If it’s fully occupied, there’s no point in letting more cars in because they won’t be able to find new spaces, and it will only lead to a mess in the parking lot. At any time, the maximum number of cars in the parking lot is 60. This is the main invariant that we have to keep. If some vehicle leaves and creates a vacant space, we can let a new one in. To do this, we put a barrier at the entrance. The same kind of barrier is needed for our program.
Create a service that limits the number of requests from users to our system.
The service must work very fast, be reliable, and consume few resources.
It must be scalable and be able to update access rules on the fly for each client. The service should provide a rich set of metrics so that you can see any bursts of activity.
Let’s look at the basic approaches to creating: a bucket and sliding window.
The first idea that comes immediately to mind is to use the permissions we can give to requests.
In the parking example, this could be parking tickets. In the computer world, it calls tokens. There are two types how to provide tokens for customers:
Token bucket and leaking bucket. Let’s see how they work.
We studied our parking lot for a long time and concluded that cars in the parking lot leave it in one hour. We gave the security guard cards that the driver had to take for the barrier to open.
Every hour he inserts 60 cards into the machine. Thus, every hour only 60 cars can enter the parking lot.
However, we have to consider that if one car doesn’t leave in an hour, we will have more than 60 cars in the parking lot, which will lead to problems.
In this case, we need to be more careful about the number of cards we issue and/or the time of issue. It can be solved with a leaking bucket.
The leaking bucket looks like a token bucket — we issue parking cards, exactly 60 pieces. And we give them to the security guard. After any car has left the parking lot, the guard returns our card to the barrier. In this case, we will always have a maximum of 60 vehicles in the parking lot. But we get a new problem. What if all the cars stay in the parking lot for more than an hour? With this approach, we must additionally figure out how to force the release of parking spaces.
There are several other ways to work with queries — these are the so-called window functions.
The first idea is a fixed window. We can give the counter to our guard. And tell him to let in 60 cars every hour. From 1 p.m. to 2 p.m., let 60 cars through, from 2 p.m. to 3 p.m., let 60 cars through, and so on.
But this solution has serious flaws. Let’s say that 10 minutes before the end of the window, 60 cars arrived. And in the first 10 minutes of the new one, 60 cars also arrived. Total for 20 minutes, there are twice as many cars in our parking lot. However, we expected them to arrive evenly for 120 minutes.
We can get around this problem in two ways. The first is a sliding window log.
Let’s store the timestamp for each car and count how much time has passed since the arrival of the 60 car. If it was less than 60 minutes ago, we do not give to parking the new car.
The good thing about this system is that we precisely regulate the number of cars. But it also has a significant disadvantage. We have to save time for each car and constantly delete entries made more than 60 minutes ago.
Or we can use the sliding algorithm approach. Here we have two approaches. Let’s say a new window started 10 minutes ago. To let a new car in, the guard counts the average number of cars per minute over the past hour and multiplies that number by 50 minutes. Then adds the number of cars that arrived in those 10 minutes. And if it is less than 60, it allows a car to pass.
Or, the hour can be split into baskets of 10 minutes each. Then the guard will add up the five baskets of the last hour and one for this hour. If the total is less than 60, that’s okay. We add +1 to the new basket and allow the car to pass.
Conclusion. We have now become acquainted with the approaches to organizing rate limiters. In the next article, we will implement our in-memory rate limiter and test its work, compare it with ready-made solutions and see if the reactive approach has any advantages.