System Design — Rate limiter and Data modelling

More often than not, the design interview rounds start with basic questions such as “design a rate limiter” or “design a circuit breaker” and can go to much more depth based on your experience. The questions posed can be to design an existing technology or any specific use case of the company you’re interviewing for. Sometimes these questions get asked in telephonic screening which is an elimination round and decides whether a person’s candidature should be taken forward or not; Along with the design, one might be asked for data model to the solution or even code it up.

The design of a simple rate limiter…

Problem: Design rate limiter — A rate limiter is a tool that monitors the number of requests per a window time a service agrees to allow. If the request count exceeds the number agreed by the service owner and the user (in a decided window time), the rate limiter blocks all the excess calls(say by throwing exceptions). The user can be a human or any other service(ex: in a micro service based architecture)

Example: A device with a specific ip address can only make 10 ticket bookings per minute to a show booking website. Or a service A can hit service B with a rate of at most 500 req/sec. All the excess requests get rejected.

For these use cases, using persistent memory stores like mysql is a bad idea because the time taken for disk seeks is high enough to hamper the rate limiter granularity. For instance, let’s say that we’re using a single mysql instance to store the request counts and mysql takes 1ms to process 1 request, which means we can achieve a throughput of 1000 req/sec. But a rate limiter using in-memory cache takes around 100nanosec(a main-memory access) to process 1 request, which implies a granularity of around 10Mreq/sec can be achieved.

Various Levels: There are different levels at which one can design rate limiters. Listing a few…

  • Rate limiter in a single machine, single threaded scenario
  • Rate limiter in a single machine, multi threaded scenario — Handling race conditions
  • Rate limiter in a distributed scenario —Distributed Cache Usage like redis
  • Rate limiter from client side —Prevent network calls from client to server for all the excess requests.

There are different designs to implement rate limiters, some of which are described below…

Fixed window counters

  • This is a very rudimentary way to tackle the rate limiting problem
  • There is one bucket for each of the unit time window.
  • Each bucket maintains the count of number of requests in that particular window

For example, a rate limiter for a service that allows only 10 requests per an hour will have the data model like below. Here, the buckets are the windows of one hour, with values storing the counts of the requests seen in that hour.

"1AM-2AM": 7,
"2AM-3AM": 8

With the current model in the above example, if a new request is seen at 2:45AM, we get the count from the current bucket(2AM-3AM) which is 8 and verify that if processing one more request exceeds the permissible limit of 10, if that is the case, an exception is raised; if not(which is the case here), count of the bucket is incremented by unit(to 9) and the request is allowed to be processed. Only the counts of the current window are stored and older windows are deleted when a new window is created(i.e in the above case, if the hour changes, older bucket is deleted)

Space Complexity: O(1) — Storing the current window count

Time Complexity: O(1) — Get and simple atomic increment operation


  • Easy to implement
  • Less memory footprint, ’cause all that is being done is storing the counts
  • Can use inbuilt concurrency with redis like technologies


  • This in incorrect. Explanation: In the above case, if all the 7 requests in the 1AM-2AM bucket occurs from 1:30AM-2AM, and all the 8 requests from 2AM-3AM bucket occur from 2AM-2:30AM, then effectively we have 15(7 + 8) requests in the time range of 1:30AM-2:30AM, which is violating the condition of 10req/hr

Sliding window logs

  • For every user, a queue of timestamps representing the times at which all the historical calls have occurred within the timespan of recent most window is maintained.
  • When ever a new request occurs, a check is made for any timestamps older than the window time and these are deleted as they are no longer relevant(Instead of doing this at every request, this step can also be done periodically after every ‘n’ mins or when a certain length of the queue is reached)
  • The new timestamp is appended to the user’s queue
  • If the number of elements in the queue is not more than that of the allowed count, the request is let through, else an exception is raised.

Space Complexity: O(Max requests seen in a window time) — Stores all the timestamps of the requests in a time window

Time Complexity: O(Max requests seen in a window time)— Deleting a subset of timestamps


  • Works perfectly


  • High memory footprint. All the request timestamps needs to be maintained for a window time, thus requires lots of memory to handle multiple users or large window times
  • High time complexity for removing the older timestamps

Data Modelling

In memory Design (Single machine with multiple threads)

The following solutions use redis based pipelines which provide ways to use optimistic locking in redis. One can go through redis transactions, to get a gist of concurrency in redis. The below data modelling is optional.

Sliding Window Logs — Redis Design

Sliding window counters

  • This is a hybrid of Fixed Window Counters and Sliding Window logs
  • The entire window time is broken down into smaller buckets. The size of each bucket depends on how much elasticity is allowed for the rate limiter
  • Each bucket stores the request count corresponding to the bucket range.

For example, in order to build a rate limiter of 100 req/hr, say a bucket size of 20 mins is chosen, then there are 3 buckets in the unit time

For a window time of 2AM to 3AM, the buckets are

"2AM-2:20AM": 10,
"2:20AM-2:40AM": 20,
"2:40AM-3:00AM": 30

If a request is received at 2:50AM, we find out the total requests in last 3 buckets including the current and add them, in this case they sum upto 60 (<100), so a new request is added to the bucket of 2:40AM–3:00AM giving…

"2AM-2:20AM": 10,
"2:20AM-2:40AM": 20,
"2:40AM-3:00AM": 31

Note: This is not a completely correct, for example: At 2:50, a time interval from 1:50 to 2:50 should be considered, but in the above example the first 10 mins isn’t considered and it may happen that in this missed 10 mins, there might’ve been a traffic spike and the request count might be 100 and hence the request is to be rejected. But by tuning the bucket size, we can reach a fair approximation of the ideal rate limiter.

Space Complexity: O(number of buckets)

Time Complexity: O(1) — Fetch the recent bucket, increment and check against the total sum of buckets(can be stored in a totalCount variable).


  • No large memory footprint as only the counts are stored


  • Works only for not-so-strict look back window times, especially for smaller unit times

Data Modelling

In memory Design (Single machine with multiple threads)

Sliding Window Counters — Redis Design


  • Python is chosen owing to its less verbosity
  • The code can be further optimised, but it was traded for readability


Also please checkout my article on Online Judge Modelling

Thanks to Bhargav for reviewing. Please clap if you think it’s useful and comment your opinions about this article.