Designing a Distributed Rate Limiter — Introduction

Hiresh Trivedi
wineofbits
Published in
5 min readAug 4, 2021
Neo from matrix limiting bullets

What is Rate Limiter?

Imagine you have deployed a public API and the users are crazy about the product, it could be

  1. URL shortening
  2. Getting the nearest pizza store from the user’s location
  3. Getting the statistical analysis of a stock

But you do not have …

A middleware that can control the rate of the traffic that is being sent to your web servers based on flexible rules. If the specified threshold is violated, it drops or enqueues the request.

Image credit : https://www.tibco.com/sites/tibco/files/media_entity/2021-04/rate-limiting-diagram.svg

This rules out malicious requests that can overwork your servers and enable you to distinguish organic requests from automated ones by identifying users’ accounts or IPs that are violating specifications for your API rate limits.

Why designing a Rate Limiter is interesting?

It is interesting mainly for its span of problems across breadth and depth that it solves while also preventing a DDoS attack. Rate limiters can be used,

  1. For the client applications of your trading system: allow each client app to only send 5 orders per second
  2. Throttle number of rewards on payment platform per user per month, or/and a global limit on rewards per user
  3. For platforms like Medium: allow non-premium users to access only 3 premium stories per month
  4. Early Fraud detection if possible can be dropped from middleware before reaching the web servers

As it appears, it needs to handle cases of persisting states of available access counters for months and needs to handle data-intensive bursts of requests with low latency.

There also can be cases where access restrictions are ‘soft’, eventually serve requests but do not send requests to web servers right away if rules are violated (throttle), adding requests to a queue will also open gates for a back-pressure scenario where your queue is overwhelmed.

We talk about server-side rate limiters here, but clients can also have rate limiters in place before sending out requests to servers.

It is also good to know that OWASP security top 10 has a lack of rate limiter at number 4. It’s very pleasing to have a solution that not only secures your service but also improves performance and user experience.

Why not just look up docs and use an existing popular rate limiter?

Given that the problems rate limiters solve are faced in almost every large-scale distributed system, there are a lot of rate limiters that can do the job.

Amazon uses Token Bucket Algorithm to limit requests, so does Stripe. Some say a fixed window algorithm is better.

And many more solutions, that can get you going in no time, however, it is very crucial to understand basic working mechanisms and know challenges to anticipate for each of them, with more choices comes the problem of choosing the most suitable choice.

Hence the scope of this series is to give a detailed account of the design of Rate limiting systems, algorithms used to achieve rate limiting, and CAP challenges in a distributed system.

Why not just use server configuration parameters to limit requests?

Parameters like max connection and max thread count can tune requests but our requirement simply is a bit customized, these parameters have no knowledge about the user interaction with application logic, imagine if few clients cover the max number of requests threshold, other users will have to simply wait even when existing requests are not justified.

Load balancers will also distribute requests in round-robin or based on current server load and be agnostic about the application logic on throttling and latency potential of the request.

Should we not auto-scale our infrastructure to accommodate all the requests?

Apart from the charges that auto-scaling will incur, it is not an instant process, it will still be a possible outage situation while the nodes are added, registered in service discovery and consensus is established on leader, the request rate will have to be insane for this scenario, which typically is… for a worldwide popular service. Snippet from a slack outage report might help.

The large number of broken instances caused us to hit our pre-configured autoscaling-group size limits, which determine the maximum number of instances in our web tier.

we couldn’t provision new dashboard service instances because provision-service was overloaded.

Rate limiters can also control the rate of low priority requests of API and high priority requests, allowing you to drop low priority requests if there are some internal issues and serve high priority requests.

Sounds like the rate limiter will be a performance bottleneck since it will check every incoming request, who will limit the requests to the rate limiter?

  • Rate limiter will be deployed in a distributed system, we have multiple nodes of rate limiters receiving requests from load balancer (Load Balancer is not a bottleneck because of DNS Load Balancing)
  • The number of rate limiter nodes will not be as high as application server nodes as the rate limiter’s capacity to handle requests is substantially higher as it’s not a high latency operation as we will see in our next article.

The preface, as lengthy as it is, serves the purpose of clarity on deciding when to use a rate limiter and the scenarios that it serves best. We do not want to document, build and test a middleware when we are better off without it or find out that we could have done better. We can now happily proceed to discuss details. Designing a rate limiter — Deep Dive

--

--