Design Decisions — Harley Quinn

The twitter-like application

Omar Elgabry

Published in

OmarElgabry's Blog

9 min readJul 20, 2018

Introduction

Harley Quinn is a twitter-like application.

Application features

Write tweets (text and image)
View home and user timelines
Search for tweets given a query
User has followers and followed by other users

The user timeline is all the tweets a user created. While the home timeline is a merge of latest tweets the user is following, sorted by time.

Facts & Numbers

10M active users
10K queries per second are spent reads
Hundreds (~500) of requests per second are spent on writes.

Users consume more than they produce. More time is spent on read than on write.

Architecture Overview

1. CDN

The CDN will be used to cache tweet’s images and the asset files (CSS and JS).

2. Load Balancer (+ more)

3. Web app servers (+ more)

4. Services

Different, independent services are needed to carry out different tasks. They provide an interface on how to interact with them.

They’re not exposed to the external world, only internal components of our architecture can interact with them. So, we’ll have:

Timeline service: Knows about home and user timelines. Maintains Indexes to find out who tweeted what, and who’s following who very fast.
Tweet service: Saves and Returns tweet text.
User service: Returns user data.
Image service: Handles image upload and storage.

5. Job Queues

The job queues are mainly used to process “writing tweets”. We’ll enqueue jobs to be processed by the services. The services contain the logic and do the actual work.

This operation takes a long time. We’ll get into that later.

6. Caching

We rely on caching to cache the tweets and user meta data. We’ll use Redis and Memcache for different caching purposes.

Memcache is simple, consistently good performance, and we’ll use it for caching tweet’s text, say last month tweets, and user meta data, primarily to reduce read load on the databases.

While Redis provides a variety of data structures, and has persistence and replication, with flexibility on the implementation, which make it suitable to cache things like tweet ids of the timelines.

7. Search Engine

Every tweet will be indexed and stored to be queried later. The search engines has the functionality to quickly look up tweets that contain the query keywords, leverages what’s called inverted index. All tweets will be pushed to the search engine through the Firehose.

The most popular search engines are Elasticsearch and Apache Solr. Both are built on top of Apache Lucene.

8. Database

A relational database, like MySQL, would work fine with sharding and replication.

9. Cloud storage (or BlobStore)

A cloud storage (or blob store) stores images forever-ish. Amazon S3 or Google Cloud Storage are good options.

Part I — Reading Pipeline

In our case, we have 3 major readings actions: (1) Home timeline, (2) User timeline, and (3) User data.

The pathway of reading from home and user timelines

So, what happens when the user view his home timeline?

The timeline service gets the user’s home timeline from the cache. Redis. Every user’s home timeline is replicated, say, in 3 different machines. So, we choose the fastest one we can get to it.

But that’s will give us a list of tweets ids and not the actual tweet text, right?

That’s right!. It’s important to remember that the timeline service knows only about the list of tweet ids, and not the tweet’s text itself.

Decoupling the relationships (user and followers, tweets and creator) from the actual data will give us a lot of flexibility on how to implement and scale each of them.

So, we need to get the tweet text from the tweet service, passing a list of tweets’ ids. It stores the latest tweets in the cache. Memcache.

What if there was a cache miss? When the home timeline is not in Redis cluster?

Then, it’s a disk hit. A home timeline re-creation.

We need to get a list of all followers from timeline service, and their (latest) tweets ids, and save back to Redis.

Again. Timeline service knows nothing other than a list of tweet ids.

So, to get the actual tweet text, we need to get it from tweet service, which in turn will get all tweet from it’s cache, and hit the database if it’s a cache miss.

To get a tweet, the tweet service has first to do a lookup for the shard that contains the tweet. It uses a mapping table to map ids to shards on different hosts.

It seems there are chances to hit the database. Is there anyway to avoid it?

Usually data will be cached. Though, there are ways to optimize the database hits by using Indexes. The timeline service can maintain two indexes.

One is maps each user with the list of followers.

An alternative solution might be to use a graph database.

Second maps tweet id with user id who created it, and time when it was created.

Having indexes around these two mappings will significantly improve our queries when there is a cache miss.

So, the home timeline will be the same every time the user view it?

The latest tweets will be displayed first by default. But, It might be better for the user to recreate the home timeline experience. Shuffle Randomly? Shuffle based on heuristics?.

What about the user timeline then? The tweets of a particular user?

It works exactly the same as the home timeline.

So, a user views his timeline, the request goes to the timeline service. It goes to the cache. Redis. Another cache dedicated for the user timeline.

If it’s there, then get the list of tweet ids, and pass it to tweet service to get the actual tweet text.

If it’s a cache miss. Then, get a list of latest users’ tweets, from the index maintained by the timeline service, pass it to the tweet service, ….

If the user timeline is not something will be viewed over and over again, we can ignore the caching part, and just get a list of tweet ids, and pass it to tweet service. Chances are, tweet text is in the cache. Memcache.

Finally, the user data? User profile.

The user data is handled by user service. Active users are stored in the cache, RAM, to keep latencies down.

Database for users can be sharded, every shard has range of users, and replicated for availability; If one is down, the second can take over.

Or, just keep it in a single giant table, with indexes for fast querying. That’s it.

Part II — Writing Pipeline

So, here comes the tricky part.

What happens when the user writes a tweet?. How the tweet will be stored?.

When a user tweets, it’s assigned a unique id, stored by the tweet service in the database. The tweets database is evenly sharded, and replicated across a cluster.

Thats all …?

No. Users have followers. Each followers has a home timeline. Each timeline has list of tweets. So, we need to notify the home timeline of all the followers.

This makes read time access fast and easy. No processing needs to be done on-read. The application is consumption more than a production.

So, how to do that?

Get a list of all users’ followers, iterate over each one, and store the list of tweet ids in the home timeline.

The home timeline is maintained by the timeline service and stored in the cache. Redis.

Each tweet is replicated, say, 3 times on 3 different machines. So, when a user tweets, we’ll look up the location of all users’ followers inside the Redis cluster, and insert the tweet id into all of them.

— User’s home timeline is intentionally limited to, say, 200 entry. The oldest entries are kicked out when it’s full.
— The indexes maintained by the timeline service will also be updated reflecting the new inserts.

Oh, Ok. Anything else needs to be stored?

Yes. One last thing.

The tweet is indexed and stored in the search engine to be queried later.

It seems the process of writing a tweet takes a lot of time.

That’s right!. Its obvious that it’s asynchronous process, happens at the background, and in parallel. We use job queues and servers for that purpose.

We want to be able to, when a tweet comes in, we push it to the queue, disconnects, releasing the client connection as fast as possible.

Huumm … What if we just do these processing on-read instead on-write?

Instead of inserting the tweet into all followers’ home timeline on-write, fill out user’s home timeline at read time by, again, getting a list of all followers, get their (latest) tweets, store these tweets in the cache, …

Another way is just don’t send the tweets to all followers, just select, intelligently or randomly, a subset of them.

Balancing the work between read and write paths may lead to better results.

Now. What about outliers? Outliers? Yes, those with huge follower list.

Sending a tweet from a user with a lot of followers, that is a large fanout, and can be slow. What makes it worse when those type of users tweet each other; tweets has to be sent to all users’ followers; doing millions of inserts.

One of the consequences is they introduce race conditions; tweets of famous users will be seen at different points in time, and replies can arrive before the original tweet is received, which causes user confusion.

Part III — Search Pipeline

When we write, we do also index and insert the tweet in the search engine.

The next step is to scatter-gather across all the shards, and check if it has the content that matches this query.

The entire search engine’s index can be in RAM. So, scatter-gather reading is efficient as they never hit the disk.

The results are then returned, sorted, merged, and ranked.

An example, say we have tweets:

“Thor, produced by Marvel studio”.
“Marvel is an American motion picture studio”.

They can be indexed as:

keyword      tweet_id
thor            1
produced        1
marvel         1,2
studio         1,2
american        2
motion          2
picture         2

Part IV — Tweet Images

By default, an image can be uploaded and attached to a tweet.

The problem …?

As we’ve seen, there are many services involved in the process of creating and persisting a tweet.

If the image is tightly coupled with the tweet itself, this bundle flowed through all the services, even if they weren’t directly responsible for handling the image.

Moving big chunks of data through your system eats the bandwidth and causes performance problems for every service that has to touch the data.

Upload could get to almost complete and if there was a failure it all had to be uploaded again!. Uploads are one shot, either the upload completely succeeded or completely failed.

The solution is …

First, we want to decouple image upload from the tweet, and store the data and refer to it with a handle, an id. This will make us able to independently optimize each pathway and gain a lot of flexibility.

Second, moving to segmented resumable uploads will result in big decreases in image upload failure rates. If the network goes down for any reason you can pause and pick up the segment you left off at when the network comes back.

What about the image storage? Where and how do we store?

When an image is uploaded, we create different variants of it (thumbnails, small, large, etc), and then, push them to the CDN. The original image will live in the cloud storage for forever-ish until deletion.

Images that were months and years old, have a low probability of being accessed, say, after 30 days. It would better to keep variants of image only for 30 TTL, and so old images variants could be recreated on the fly rather than precomputed, which saves data storage.

Deadshot avoids the unnecessary storage by predicting what the users are likely to watch, and sending the movies to the CDNs based on that.

So, here’s the deal …

An image is upload to an endpoint it’s only responsibility is to put the original media in cloud storage (or BlobStore)
An image id, a unique identifier for the image is used instead of using the image itself.
The Image service generates image variants and store in cloud storage.
Push the image to the CDN, keeping variants of image only for 30 TTL days. Recompute all variants on reads for old images.

An image is divided into segments, say three segments. The segments are appended, each append call gives the segment index, all appends are for the same image id.

Thank you for reading! If you enjoyed it, please clap 👏 for it.