Design a system that tracks the number of “likes”

problem statement

KK XX
3 min readJun 12, 2019

Design a system that tracks the number of “likes” (posts / tweets / photos) at scale.

how to approach

This is an interesting problem that looks very simple at first, but actually requires careful thoughts to make it “right”. I will show my thoughts below and demonstrate how the architecture evolves when requirement changes.

The simplest design we can come up with is :

Fig 1. use a single table

Here are the pros and cons of this simple design :

  • + strong consistency guarantee. any read-after-write is guaranteed to return the latest number of counts. the client would never see a stale read.
  • - it doesn’t not scale well. millions of concurrent insertions might slow down the DB. We can shard the DB by post_id, but that doesn’t mitigate the high volume of concurrent insertions.
  • - it’s expensive to count the rows on-the-fly (note that engines Innodb doesn’t maintain a count internally, instead, queries like “SELECT COUNT(*) FROM LIKES WHERE …” almost always re-scan the table). We can introduce another component, either a cache or another table to store the count number, if we do that, the key problem is how to make the “likes” table and the “like-count” table consistent.

To overcome the last two issues, we have to relax the consistency model a little bit : now after user clicks “like” on a post, it returns immediately without waiting for the entry being inserted into the DB. We also introduce some additional components below :

Fig 2. MQ, Batch updater, Cache.
  • MQ : a message queue is used as a buffer for async update, and for reliably “broadcast” the (post_id, user_id) pair to multiple consumers. Apache Kafka is a popular choice in industry.
  • Updater : this is just a consumer that consumes message and insert it into the “likes” table.
  • Batch updater: by relaxing the consistency model, we are able to batch the updates with the same post_id to reduce the write load to DB. The downside is that it requires introducing additional infrastructure, like a stream processing cluster (ex. Apache Flink, Apache Storm, etc.), to do the “batch update” reliably.
  • A cache is used to store the latest count of likes of posts.

The pros and cons of the second design are :

  • + by using MQ as a buffer, it reduces the write load to DB.
  • + by using batch updater, it further reduces the write load on some popular posts.
  • - The data in “likes” table and “like-count” table may become inconsistent. This can happen even without component failure — most MQ cannot guarantee exactly-once processing. we might see more “likes” than the real number.

To overcome the inconsistency, one approach we can use is to re-compute the count from the “likes” table periodically. Since “likes” table always holds the source of truth, the recomputed result could be used confidently. To get the real-time number of likes for a post, such result must be merged with the “likes” on-the-fly. As illustrated in the figure below :

Fig 3. Online + offline

In the figure above, we calculate the likes from the “likes” table with a daily cron job, that would give us the accurate number, before 12am every day. On the other hand, the “real time events” table should keep track of all the “likes” happen after 12am (12am — now). By reading from both table, we should get a pretty accurate and up-to-date count number. This design often refers as the Lambda Architecture.

--

--