Distributed cache system design

Published in

System Design Concepts

8 min readMar 2, 2021

What is caching?

In computing, a cache is a high-speed data storage layer that stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data.

The caching system’s data is actually stored in the faster access hardware like RAM. RAM provides faster i/o operation and reduces latency. Caching is used in every layer of technology e.g: operating systems, CDN, DNS, in many applications like searching, also used in games to increase media content performance.

When implementing a cache layer, it’s important to understand the validity of the data being cached. A successful cache results in a high hit rate which means the data was present when fetched. A cache miss occurs when the data fetched was not present in the cache. Controls such as TTLs (Time to live) can be applied to expire the data accordingly.

Feature estimation:

We want to save a huge amount of data, size near to TerraByte.
Our cache should able to sustain 50K to 1M query per second.
Expected latency approx 1ms.
LRU (Eviction)
100% Availability
Scalability

Cache access patterns:

Write through: whenever any “write” request comes, it will come through the cache to the DB. Write is considered successful only iff data is written successfully in the cache and in DB.
Write around: Write request goes around the cache straight to DB and acknowledge is sent back, data is not sent to cache. Data is written to the cache when there is the first cache miss.
Write back: Write update goes to cache, data is saved in the cache, and acknowledge is send immediately. and then there is one more service that will sink the data to DB.

The data structure used to implement cache is “HASH TABLE”. All we need is hashing function, a key, and a value.

Working of HashTable:

As depicted in the above image, let’s say we have 3 data, “Apple”, “Boy” and a “Cat” which needs to get stored in the hash table. These values are being given as input to a hashing function (hashCode()) from which we get hashed values say “53,78 and 76 respectively in this case. These values then get divided by the size of the bucket i.e 10 and are being stored at their remainder value’s bucket no. i.e 53 in bucket no. 3, 78 in bucket no 8, and so on. Let’s say we have another data “caterpillar” whose hashed value is 66, which is also required to be stored in bucket no 6 same as that of the cat. When two different keys give out the same value of the bucket, that’s when the collision happens. For obvious reason, we can’t save both the value at the same location.

There are several strategies to resolve collision like separate chaining, open addressing, robin hood hashing. One simple strategy is instead of storing value at bucket no. 6, we will create a linked list where we will store the key-value pairs. This is known as separate chaining using a linked list.

To retrieve the value, give the key to the hash function say “Apple” it will give a hashed value take module by the size of the bucket, and look at that particular location.

Caching in a distributed system:

As depicted in the image, All the orange block values are stored in node 1 and blue on node 2. If due to some reason node 2 fails, these two locations i.e bucket no 9 and 10 are not available. The Hash table remains the same but the bucket size is now 8. Now for the same example “Apple”, we do % by 8 as the system went off, 53%8, instead of 3 we get 5. which is empty in this case. Instead of empty there can be scenarios of getting the wrong value as well, which is not acceptable. We need to do the remapping which is a tedious job. what if we have 10k keys instead of just 10. To solve this instead of going for a conventional hash table, we go for consistent hashing or a consistent hash ring.

Working of consistent hash ring:

In a conventional scenario, we have knowledge about the available memory location as we use to consecutively name the keys in our hash table with consecutive numbers to say from 1–10. Here twist is we are assigning keys totally randomly.

For “apple” we pass it to hashing function and got the result as 2394. We take this hash number and directly go and find the location. Find a bucket of keys that is greater than 2394 in this case it's 3030. We save our value here. Let say another key ‘Ball’ with the value 2567, this will also get stored in the same location in a chaining way. if we got hashed values as 11000, then since there is no available value we go back to the front and save it at 1000. This is something like a ring happening. This is how a consistent hash ring works.

Cache eviction policy:

How do we evict cache and when do we need to evict cache?

Eviction means removing key and value entry from the cache memory. why do we need to do this? The cache is expensive and not all the values present there are always is being used. So we need to identify which entry is not being used and is sitting at the location ideally. We need to remove them to make space for new entries. This is known as the cache eviction policy. One of the common strategies is “Least Recently Used” or LRU.

LRU: Evict the entry which is being least recently used.

We have to somehow figure out which is the least recently used bucket to save new entries. we need to free out the memory. How do we implement LRU?

LRU sample code:

Internals of cache:

So far we have dealt with hash tables, RAM, and LRU. we need to have a service that serves gets/put /delete requests. we are using RAM, even though it’s fast, it is still blocking calls. we need to have an efficient way of serving these requests. One thing we can do is to spawn n no of threads as and when we get the requests or we can have a bigger thread pool to handle thread. Another thing we can do is event-driven logic.

As depicted in the above picture, we have an event queue. Any request ‘get/put’ will first come into an event queue. we have an event loop that is running indefinitely and is a single-threaded operation. After that, we have a thread pool that only takes care of Input/output operations to the RAM.

Whenever we get a ‘get/put’ request, it is placed in an event queue. The event loop keeps on reading the event queue and hands the request which is free in the event queue. once it handed over to the thread pool, it callbacks and again reads the event queue. In this way, the event queue is never blocked. The thread in the thread pool that receives the request does the operation, gets the data, and also through the callback response is being returns to the request either through the event loop or some other mechanism. This is how we can handle the request more efficiently.

Fault-tolerant:

How to make our caching system fault-tolerant? As we know all of our hash table and data are store in RAM, what happens if there is a power loss, all of our data goes for the toss. This means our cache system is not persistent so to make it persistent we have to do something.

Regular interval snapshot: We have a synchronous service ‘S’ running in the background. Which takes the frozen copy of the hash table and place it in the dump file and this will be saved on the hard disk.

2. Log reconstruction: Instead of having a synchronous service, the service which is responsible for read and write in the hash table will keep on reconstructing logfile logs. All the operation in the log file makes this async call which keeps on updating the log file. We can put these requests in a queue in between to not affect the read/write capability. If we have each and every operation logged, if our machine goes down and comes up, we can reconstruct the hash table using the log file operations.

Availability:

How to make our system 100% available?

Consider in your cluster, we have two nodes node1 and node2 and both have their own cache, say we are using consistent hashing, node 1 is having some key and values and node 2 is handling some key and values. What if node 1 goes down? All the requests coming to node 1 will have a cache miss. what happens now is that these requests which are now coming will hit DB and hence read/write on DB increase. To avoid this what we can do is have a replica of node 1, let's say we have RF = 2. So whatever data node 1 has, the same data node 1' will also have.

Advantage: Load is reduced because the requests going to node 1 will also go to node 1' i.e requests now going to node 1 can also be shared with node 1' and in this way read requests distributed and less latency high performance.

Disadvantage: keep syncing these two nodes. can lead to inconsistency.

How to resolve this? Instead of making a replica, we can have a slave of nodes. In this case, also data updates go to slaves as well. but read/write happens always on master nodes and won't touch slave until master goes down.

That’s all about “Distributed cache system design”.

Please like, share, and comment if you want to add something or if you have any queries. Stay tuned for more such blogs.

Distributed cache system design

Written by Jolly srivastava