How does Facebook handle billions of async requests?

4 min readAug 22, 2023

It’s not fair to ask that type of question, as there is so much going on at Facebook, and handling billions of parallel requests relies on many things. And there is no diagram that can illustrate that. So instead let’s focus on a system that Facebook developers built known as Async.

Some of async requests can be executed at different times, for instance, notifications about a new message, a post from a friend, or a Facebook Live stream starting. Some of these may be more time-sensitive than others. To manage the different levels of urgency for different types of async requests, we built a system called Async. The original Async was a simple system, but as our apps and services have grown, we needed to scale to manage a greater volume of requests.

Before re-architecting Async

In the original version of Async, all the asynchronous requests were processed and stored in a centralized database. Dispatchers would then query and sort the requests stored in those databases. Later, when the dispatcher pulled jobs from the database and sent them to the worker servers, the sorting operation would provide some basic prioritization based on the request’s desired execution time.

At that time, Facebook had three priorities, and inside each priority queue, it was first come, first served. When new use cases were introduced, they just added more workers. This is not a great solution at scale, because smaller use cases would have to wait in the queue for a chance to run. This was not a big problem if we speak about small system. But when dealing with billions of requests this is not a good design at all.

Main challenges were:

Prioritization
Which request to run first. There are bunch of features: user sends a message, posts a picture, Safety Check during natural disasters, someone liked a post, etc. How does the system determine which action to complete first? To help the system understand what to prioritize, developers introduced the concept of delay tolerance, or how long each request can be delayed.
Capacity optimization
Daily traffic has both peak and off-peak times. If a major event happens the traffic goes up, sometimes machines would be idling if the traffic is low. Hard to predict the correct capacity. One way they manage the different levels is by classifying them into one of three categories: daily traffic, major events, and incident response. We then use queues, time shifts, and batching to better optimize the capacity of our machines.
Capacity regulation
What if a job suddenly consumes high CPU and memory due to a bug? To regulate how much of the capacity is consumed by each use case, they introduced policies that prevent any one job taking more resources than it should.

So the story is long (full article is at the end), but to summarize, they use the idea of priorities. Not all events/notifications/actions are equal. Some of them have higher priority, so they get processed first (with regulating it with policies). Knowing how long the async request can wait helps a lot as well.

A couple of more important topics that helps to scale the system

Queueing plays an important role in selecting the most urgent job to execute first.
Predictive and Deferred compute for Time-shifting to optimize compute. Predict compute based on yesterday and deferred compute to do opposite (For instance, the “people you may know” list is processed during off-peak hours, then shown when people are online (generally during peak hours).
Batching to reduce the load by accumulating multiple jobs into one mega job and store it in the queue on the service side.
Capacity policy with quotas and rate-limiting

Overall, the Async is quite complex, and the developers did a great job by describing the challenges and solutions. This is based on the article written in 2020, and newer versions have been published. The links are below if you are interested to dive deeper.

Sources:
https://engineering.fb.com/2020/08/17/production-engineering/async/
https://engineering.fb.com/2023/01/31/production-engineering/meta-asynchronous-computing/

Thanks for reading. Subscribe for more articles and don’t forget to check FB engineering blog for more information.

How does Facebook handle billions of async requests?

Before re-architecting Async

Main challenges were:

Written by David Mosyan