Episode 7: Let’s build a distributed cache(Part 1/3: Basic cache design).

Concurrency and Distributed Systems: A guide with Kubernetes

--

In this episode we will take a little detour from the Sum of Primes service and start building a (mini-)distributed cache system. The overall architecture of this distributed cache is as follows:

Distributed cache system architecture

Facing clients, we have multiple cache servers which act as specialized load-balancers and direct requests to the right cache worker. The cache servers uses a Consistent hashing method to find the right worker for a request. Consistent hashing helps minimize the re-hashing problem when one of the workers dies or when new workers are added.

Each cache worker is responsible for actually holding the cache key values. Inside each worker, the total memory is split into multiple shards and each shard is backed by simply a Hash Map(plus some auxiliary data structures) as cache store. Why we have multiple shard in each worker? To increase concurrency and reduce the cost of updating and evicting cache items.

Cache Shard Design

Every cache has a max capacity because of memory limitations. Say you have 1GB of memory to work with for every shard. In addition we have the following requirements and limitations for cache entries:

  • Key: Max size of 256 bytes
  • Value: Max size of 1KB

Given 1GB of RAM, one can store:

1GB / (1KB + 256 Bytes + 128 Bytes overhead) ~= 762600 entries in cache.

When your cache hits that limit, you need an eviction strategy to make room for the new entries.

There are some common strategies for eviction:

  • Least Recently Used(LRU): Evict the entry which was used a long time ago.
  • Least Frequently Used(LFU): Evict the entry which is not used often.

In LRU model, every time an entry is used its usage timestamp is updated to the current time. When we want to evict, we evict the entry with the oldest timestamp.

In LFU model, every time an entry is used its usage counter is incremented by 1. When we want to evict, we remove the entry with the smallest counter value.

In part 1 of this episode we will implement a bare-bone cache in Java which supports both of the above strategies and is capable of extension to support other strategies as well.

Cache Shard Implementation

The important thing in basic cache implementation is cache eviction strategy. Otherwise, the rest of it is simply calling various get , put and remove functions on a map(e.g. A Java Map).
The other important thing is to make sure the cache is thread-safe. Having those two things in mind, let’s dive into the implementation.

Both LFU and LRU approaches work by assigning an attribute to the keys. In case of LFU, the attribute is a counter which is increased every time the key is accessed(get or set). For example, imagine we have the following items in the cache and we just inserted them:

All of the items have their counter attribute set to 1. Now let’s say the following calls to cache is made:

After the above accesses, you’ll have the following updated attributes:

Now, if we have to evict an item from the cache, because say our cache’s max capacity is 5 and we cannot add any new items, we should either evict server2 or server5 since they have the least attribute value(1).

The similar scheme happens for LRU cache, except instead of increasing a counter, we are setting the attribute to the access time(in nano/milliseconds):

Problem: The main issue is that we need an efficient way of finding and updating the attribute values of the items in the map. To be precise we need to preserve the total ordering of items at all times.

The best data structure for doing this is perhaps a Binary Search Tree(BST). With an efficient implementation of BST using Red-Black trees, we can add remove and update items in the tree in O(log(m)), where m is the number of items in the cache. For example if the cache shard has a maximum capacity of 2²⁰ items. Then each access to the cache could cost 20 operations which is not bad as long as our cache is limited and is not growing boundlessly. We can compensate for the limits on the cache shard size by adding more shards and adding more cache workers. That will put a bound on O(log(m)) running time and also enables us to access various shards concurrently.

Every cache has an Eviction Manager. Eviction manager is responsible for tracking accesses to the cache items and updating the associated attribute(counter or timestamp) accordingly. Here is the interface for the EvictionManager:

EvictionManager interface in Java

When it is time to evict an item, cache would ask the eviction manager to give it a candidate item for eviction via selectKeyToEvict method. Cache is also responsible to let the manager know when a key is accessed or deleted.

The LFU and LRU implementations have a lot in common and hence we create an abstract class which encapsulates their common properties. This class is called SmallestAttributeEvictionManager . It is based on the idea that we want to evict the cache item with smallest attribute whether it is a counter or a timestamp:

SmallestAttributeEvictionManager abstract class

In the above class, the keySet is a SortedSet which keeps the total ordering of items in it at all times and can tell us what is the smallest item in the set. TreeSet is a very good implementation for SortedSet , using Red-Black BST. A TreeSet would need a comparator to compare items together. Comparator compares items based on their attribute value which can be retrieved from the attributeMap field.

Now, the LFU eviction manager, simply increase a counter from zero upon every access to cache keys. The comparator, first compares the value of the counters and if they were equal it compares the value of keys together. Here is the implementation of LFU eviction manager:

LFU Eviction Manager

The cache interface itself is as follows:

The cache implementation of this interface would take an eviction manager as input.

The eviction manager’s access method is called inside get, set method.

In order to make the CacheImpl thread-safe, we are using a ReentrantLock to guard each method of the cache. We could use the synchronized keyword on each method, but this lock has some nice properties and features which could be beneficial.

Now let’s take a look at cache methods’ implementations:

Eviction

When it comes to how we want to do the eviction there are a few options:

  • We can be proactive and run a scheduled process inside cache to evict the cache items if the number of items in the cache is over a certain threshold or when cache is above 80% of its maximum capacity.
  • We could check the size of the cache inside the set function and run the eviction process if necessary.
  • Eviction can evict one item or many items from the cache. It makes more sense to evict many items because we will pay a one-time cost to evict many and we probably don’t have to do that for some time.

Whatever is the strategy for eviction, we can implement it with the current setup that we have. Here is a simple implementation that I have:

This concludes our bare-bone implementation of a basic cache. In the later parts of this episode, we will dive into creating worker and server processes for a true distributed system.

Find the code for this episode in here.

--

--