Designing the Caching system

6 min readOct 21, 2018

In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data.

If you like to watch the video version of this article

The data in a cache is generally stored in fast access hardware such as RAM (Random-access memory) Due to the high request rates or IOPS (Input/Output operations per second) supported by RAM and In-Memory engines, caching results in improved data retrieval performance and reduces cost at scale

Applications: Caches can be applied and leveraged throughout various layers of technology including Operating Systems, Networking layers including Content Delivery Networks (CDN) and DNS, web applications, and Databases. You can use caching to significantly reduce latency and improve IOPS for many read-heavy application workloads, such as Q&A portals, gaming, media sharing, and social networking

Caching Best Practices: When implementing a cache layer, it’s important to understand the validity of the data being cached. A successful cache results in a high hit rate which means the data was present when fetched. A cache miss occurs when the data fetched was not present in the cache. Controls such as TTLs (Time to live) can be applied to expire the data accordingly. Another consideration may be whether or not the cache environment needs to be Highly Available

Implementation:

So the standard way to implement cache is to have a data structure, using which we can access value by a given key in constant time.

One such data structure is Hash map, and here is how do we implement it

The HashMap itself could be implemented in multiple ways. One common way could be hashing with linked list (colliding values linked together in a linkedList)
Let’s say our Hash map size is N and we wish to add a key(K) and value(V) to it
For a given key K generate i = hash(K) % N, and in hash table H[i] = [Value]

Now all good, we can save key value pairs in memory and retrieve it whenever we need it.

Since we have limited In-memory(RAM), we need to come up with a way to evict the cache/delete least used cache whenever we need more space!!

How do we do it?

LRU Eviction policy

We need a data structure which at any given instance can give me the least recently used objects in order. Let’s see if we can maintain a linked list to do it.

We try to keep the list ordered by the order in which they are used.

So whenever, a get operation happens, we would need to move that object from a certain position in the list to the front of the list. Which means a delete followed by insert at the beginning. Insert at the beginning of the list is trivial. How do we achieve erase of the object from a random position in least time possible? How about we maintain another map which stores the value to the corresponding linked list node.

Ok, now when we know the node, we would need to know its previous and next node in the list to enable the deletion of the node from the list. We can get the next in the list from next pointer ? What about the previous node ? To encounter that, we make the list doubly linked list.

Now How do we distribute the cache so that we can horizontally scale it?

There are many other ways to perform horizontal scaling, one way to do is by hash partitioning is called consistent hashing.

The idea is simple. Consistent hashing forms a keyspace, which is also called continuum, as presented in the illustration. As a node joins the cluster, it picks a random number, and that number determines the data it’s going to be responsible for. Everything between this number and one that’s next in the ring and that has been picked by a different node previously, is now belong to this node. The resulting partition could be of any size theoretically. It could be a tiny slice, or a large one.

Watch this video for better understanding

Hybrid eviction policy using count min sketch

Modern caches extend the usage history to include the recent past and give preference to entries based on recency and frequency. One approach for retaining history is to use a popularity sketch (a compact, probabilistic data structure) to identify the “heavy hitters” in a large stream of events. Take for example CountMin Sketch, which uses a matrix of counters and multiple hash functions. The addition of an entry increments a counter in each row and the frequency is estimated by taking the minimum value observed. This approach lets us tradeoff between space, efficiency, and the error rate due to collisions by adjusting the matrix’s width and depth.

Heavy Hitters: Count-Min Sketch

Count-Min sketches are applicable to the following problem: Find all elements in the data set with the frequencies greater than k percent of the total number of elements in the data set.

Learn more about count min sketch here:

The algorithm is straightforward:

Maintain a standard Count-Min sketch during the scan of the data set and put all elements into it.
Maintain a heap of top elements, initially empty, and a counter N of the total number of already process elements.
For each element in the data set:
Put the element to the sketch
Estimate the frequency of the element using the sketch. If frequency is greater than a threshold (k*N), then put the element to the heap. Heap should be periodically or continuously cleaned up to remove elements that do not meet the threshold anymore.

In general, the top-k problem makes sense only for skewed data, so usage of Count-Min sketches is reasonable in this context.

Concurrency:

Concurrent access to a cache is viewed as a difficult problem because in most policies every access is a write to some shared state. The traditional solution is to guard the cache with a single lock. This might then be improved through lock striping by splitting the cache into many smaller independent regions. Unfortunately that tends to have a limited benefit due to hot entries causing some locks to be more contented than others

The next idea is to use a commit log. Instead of mutating the data structures immediately, the updates are written to a log and replayed in asynchronous batches. and applied to a cache by performing the hash table operation, recording the operation to a buffer, and scheduling the replay activity against the policy when deemed necessary. The policy is still guarded by a lock, or a try lock to be more precise, but shifts contention onto appending to the log buffers instead.

Availability:

If we have a lot of machines, one way to avoid these cases would be to have multiple machines per shard where they maintain exactly the same amount of data.
Master slave technique : There is only one active server at a time in a shard and it has a follower which keeps getting the update. When the master server goes down, the slave server takes over as the master server. Master and slave can maintain a change log with version number to make sure they are caught up.
Have a copy of commit log and replay the actions to get the data back.

I hope this article gave you rough idea of how The caching system works.

Designing the Caching system

Heavy Hitters: Count-Min Sketch

Written by Narendra Lakshmana Gowda