Rate limiters in a nutshell, strategies, use cases, and implementation! — Part 1

5 min readJun 11, 2022

Image from https://e360.yale.edu/ — The Lower Granite Dam

Hey👋! I need you to meet our friend Mayan she is a software engineer running her own small business of designing and making dresses.

Mayan decided to digitalize her business by making an API for dresses and fashion and providing this API to all the websites and clients that may be interested in this field.

Mayan actually wants to make money from this API by selling a number of requests per user, so she started to think about a solution that guarantees to limit each user with the number of requests he/she bought.

While she is looking for some technology to limit the user requests to API she found an interesting topic which is Rate limiter, Mayan called me immediately to explain it to her, so join us to understand more about Rate limiting technology, strategies, algorithms and some use cases and finally I’ll give you a Plug and Play Rate limiting services to your API just like Mayan 😉.

And by the way, Not only the money she thinks of but also the availability.

Why Rate-limiting?

First, why do developers and service providers think about the rate-limiting in the first place?

Like our friend Mayan, they may want to gain some money from services and provide reasonable use for each user, so managing the quotas & pricing based on the users’ usage, while there are many consumers using the API we may want to apply the rate-limiting to guarantee the fairness while using our API resources between all users, and also making money based on the users’ usage 🤑.
To prevent resource starvation, we might need to enhance the availability of our API by preventing resource starvation, resources starvation can be caused by DDoS (Denial-of-service attacks or Friendly fire denial of service unintentional attacks).

So, in front of our service, we will add a new component called rate-limiting service this service will check that there is the capacity to accept enough requests, it can be applied to the complete service (API) or to a specific resource (Endpoint)!

In other words!

For more clarification in addition to our service, we will build a new service called “The officer — just for fun” which will take care of limiting the rate of requests to our original service, so if the user purchased a package with just 100 requests per min and he sends the request number 101 the officer will reject the request and respond with an error (HTTP response code: 429 — Too many requests!) read more, is it clear now!🤔.

Simple representation for rate-limiting implementation.

Strategies.

When comes to implementing the rate limiter in your system there are some strategies service providers follow.

Client-side, while implementing a rate-limiting to your backend services you should inform the client-side about this design, so the client is aware that the services may not respond in time and the client-side should apply different retry policies.
No Rate limiting, you can just ignore implementing any rate-limiting services to your application, you should use deadlines, timeouts, and circuit-breaking patterns to help your service to be more resilient in the absence of the rate-limiting.
Pass through, this strategy can be implemented only if your service is calling other services then you have to choose how to pass the rate-limiting responses from the downstream services to the requester.
Apply Rate limiters, select carefully rate-limiting algorithm and apply it to your service, OR simply follow the updates in this repo and have access to NodeJS Plug and Play Rate limiters PlugNPlay-NodeJS-Rate-Limiter.

Implementation and Algorithms.

To implement the rate-limiting in your system, you need to study its algorithms, there are 4 popular algorithms in this section we will explain 2 of these algorithms and we will continue in the second part.

Token Bucket.

In the token bucket, we will have a bucket of tokens that will be filled each period of time with a number of tokens.

The refiller will be responsible for filling the bucket(Database) with tokens.
With any incoming request, there will be another component (Dispatcher) that will ask the bucket if there are remaining tokens.
If the bucket still has the token, the token will be removed from the bucket and the request will be forwarded to the API or service.
If the bucket ran out of tokens, this request will be dropped and a 429 code error will be sent back to the sender.

Take care that you should carefully select the period in which the refiller will fill the bucket and the number of tokens, that’s where the complexity of the implementation comes.

Leaky Bucket (Leaky QUEUE).

As the name indicates, there is some bucket and it will leak each amount of time 😁.

We will have a bucket (a queue — can be RabitMQ or Azure Service Bus) don’t confuse yourself with the token bucket algorithm.
when a new request is sent the dispatcher service will check if the queue is full or not!
If the request is not full, so the request will be added to the queue.
each amount of time one of the requests that have been queued will be forwarded to the service.
The queue here is FIFO (First in First Out) so the oldest request will be served first.
If the queue is full, then the request will be dropped and a 429 code error will be sent back to the sender.

There are 2 other algorithms that will be explained in detail in Part 2, also don’t forget to follow this repo on GitHub, I and other friends will develop a complete Plug and play Rate limiters using the 4 implementations, you can join us and open an issue or make some code changes or even reviews!

Finally, I hope now that you can relate to the header image of the article, this is how rate limiting works in real-life 😂🤣.