Design twitter timeline
Design the backend for twitter timeline / facebook newsfeed.
how to approach
Unlike my other posts, for this design problem there’s already an excellent talk given by Twitter’s engineer. This post is a summary of the contents from the talk.
Design goal
here are some numbers that affects the architecture :
- 150M active users.
- 300K QPS for pull-based timeline query.
- up to 800 tweets could be displayed on the timeline.
- low latency on read : 1ms for p50 and 4ms for p95.
- allowing relatively high on write.
- 5K new tweets per second on average. (12K at peak).
- users can have up to tens of millions of followers.
Life of a tweet
The diagram above (taken from the slides of the talk) gives all the high-level components. Briefly, the life of a newly-published tweet can be summarized as :
- whenever a new tweet is published, it reaches the fanout service, the fanout service queries the social graph service, find out all the followers (target timelines), then it inserts the tweet id into the Redis cluster in the format (tweet_id, author_id, meta_data). The social graph service keeps track of all the followers / followees information.
- Redis cluster replicates its data on other Redis machines. Each replica stores a partition of users and the tweets that should appear in their timeline, like (user_id, [tweet1, tweet2, …]), where tweetN has the format of (tweet_id, author_id, meta_data). It leverages the native list structure supported by Redis. It’s a product call to decide the max number of tweets that should show in the timeline (in the talk, it mentions the number is 800).
- Tweets are persisted by the timeline service only. For home timeline, tweets are stored in memory of the Redis cluster only. This is to satisfy the low-latency requirement of read (Remember that a random IO against SSD disk may take a few ms). If any of the machine crashes, it reconstructs the timeline data by querying social graph service and timeline service again. Each Redis instance is replicated 3 times, and on the read side, the timeline service finds out the closest Redis machine through a hash ring lookup (details are not mentioned in the talk, but I will discuss a bit more below). The key design tradeoff here is : it’s designed in favor of read, while allowing write to have a relatively higher latency. In the architecture above, the read is just O(1) (while write takes O(n), where n is the number of followers).
- the problem with the architecture above is, the fanout service is slow, especially for those user who has millions of followers (call high-volume users). the remaining part of the talk presents an optimization : do not fanout tweets to high-volume users. Instead, merge them on the read path. It’s a tradeoff between high read latency and high write latency.
One implementation detail not mentioned in the talk is, how to replicate the Redis instance, how to partition the data, and how to locate the closest instance for a given read query. These are common problems for data-intensive systems, and can be solved with some common techniques( some of them are described in my other posts). For example, one potential solution is the consistent hashing algorithm : map all the Redis instances on a ring, where one physic node is mapped to 3 virtual nodes randomly and uniformly. A tweet will be stored on the first 3 instances it encounters when it talks through the ring clockwise starting from its own hash value. Note that the consistent hashing algorithm is easy to yield unbalanced partition without careful tuning. On the read path, since each node on the ring encodes it geo-location information, the client could easily choose the one that is closest to its own geo-location.