The thundering herd — Distributed Systems rate limiting

4 min readAug 6, 2019

Imagine a bull running towards you when you are relaxing on a farm.

What would you do? Easy “I would just get out of the way” you would say. Well how about now?

If it’s not clear it’s a herd of raging bulls. Unless you have extreme luck you would likely be trampled. That’s the exact situation of the servers when a huge amount of requests greater than it’s compute power comes in. This is commonly known as “thundering herd” problem. In this blog, I will try to discuss the various scenarios and possible solutions from my understanding through various references.

The thundering herd problem can occur when there is a cascading failure — say you have 3 servers running and a load balancer

Let us say each server can handle a certain number of requests(say 500) per second and the load balancer distributes the requests to handle the load. But what if the server 1 goes down while handling 500 requests ( after completing say 200 requests). So the rest of the 300 requests would be sent to server 2 . Let’s say server 2 comparatively has lesser computing power or even if it has the same computing power it is handling more than it’s designated capacity (QPS). So the server 2 crashes after a while ( after handling 200 requests). Now the first 200 + requests directed at server 2 goes to server 3. Now remember server 3 may additionally also have to handle new requests. So there we have the thundering herd problem . By now let’s say if you are using JVM based languages there is a high probability of out of memory error and server 3 crashes now. So there you have the scenario of “cascading failure” . Now, you may ask why not scale up (using terraform) and add another server? But in this situation of cascading failure there is a high probability that server would crash too despite higher computing power and memory allocations.

So what is a solution for this problem? I hope everyone remembers the various buffering algorithms we have read about in college :

Token Bucket

Leaky Bucket

Read them up again. For now, Let us say we have a queue in between to buffer the requests once a server goes down and then process them by distributing across available servers in limited quantities. If the QPS(queries per second) exceed the queue capacity , then we have to return a temporary error to the user (“try later”) as it is better to serve some users than none at all.

With the above idea token bucket can be implemented this way . For each request we can categorize the user by maintaining an in memory map or in redis, with the user as key and a token (which would be the number of requests remaining for the user). On each request from the same user the token is decreased by one and when it reaches zero we throw temporary error to user. Well this is space intensive when there are many users. I am sure highly scalable systems have smarter solutions.

There are many such scenarios that would cause thundering herd problems. Like the big billion day. But in that case, there is a way of handling the load as flipkart could predetermine the load approximately and auto-scale or pre-scale using terraform.

But what if some video trends, or some image of a celebrity trends. In this case it is difficult to determine the load beforehand. Even auto scale might not work out or is not economical depending on the load. In these cases metadata like number or likes,comments could be approximated to avoid placing load by constant querying and calculations.

Do customer facing applications only face this problem? No. If you are executing batch jobs(cron) then you might have this problem too. If you face constant issues in your cron jobs which curates data from it’s respective sources and updates to a store or file uploads and backups then it might be due to all those cron jobs being scheduled at the same time. One way to solve this is to interleave them and reduce the quantity of data and queries in each of those jobs. “Jitter” really helps here. In layman’s terms jitter breaks periodicity.

There are many more sophisticated solutions and i would recommend looking into various resources for further understanding

REFERENCES:

http://highscalability.com/blog/2012/4/17/youtube-strategy-adding-jitter-isnt-a-bug.html

The thundering herd — Distributed Systems rate limiting

Written by Venktesh Subramaniam