Designing a distributed cache from the scratch

Anjana Supun
Jun 24, 2020 · 5 min read

Implementing a distributed cache using ballerina language was my first internship project at WSO2. At that time I was a second year undergraduate with decent programming skills but I had no idea what distributed computing is, so this project was a unique experience for me. This article is an attempt to explain the exposure I gained on distributed systems and the process of designing and implementing a distributed cache from the scratch.

Distributed computing illustration

Requirements and the Reasoning

This is a research project which was intended to introduce the intern to the fundamentals of distributed computing as it covers lots of concepts of distributed computing. You can checkout the learning outcome section for more details on this.

The requirement was to implement a self contained distributed cache from the scratch using ballerina language. Which means I am not allowed to use tools such as Zookeeper. Ballerina is a new cloud native programming language designed by WSO2 to make it easier for developers to build distributed systems. I believe they needed to test if the language is capable of building a complex distributed application such as this one. Additionally since ballerina did not have existing libraries at that time, I had to learn and implement these algorithms as well.

When I started the project but I didn’t even know what a distributed cache is. So I had to do some research on the existing systems to understand the domain and its use cases. As a result, I found solutions such as Memcached, Hazelcast IMDG, EHCache. Then I carried out in-depth research on the features, architecture and implementation of these platforms and it gradually helped me to familiarize myself with the domain and gain in-depth understanding the e design and implementation of a distributed cache.

The problem

While I was researching on the existing systems, I found out developers are bit reluctant to use distributed caches for various reasons.

Upon Cluster Membership change, cache executes an expensive relocation task to recover the possible data loss caused by the membership change. Then when the node is back online, it will cause another data recovery task. If the network is unreliable, this could occur repeatedly. This result in a denial of service ultimately crashing the entire cluster. which could overloading the database and significantly degrade performance of the application.

Further explanation of this problem can be found in the following blog post.

The Design

Evaluation of the algorithms for each component was the most challenging and interesting part of this project. I had to learn all the related concepts and techniques that could be used in the components and evaluate the most suitable one for each component.

Membership Management

The main design decision was to ensure the reliability of the cluster members. In order to achieve this requirement, RAFT Consensus protocol was selected. RAFT is the reliability behind popular platforms such as Kubernetes, etcd.

Raft Protocol consist of following mechanisms

  1. Heartbeat mechanism to constantly monitor the status of the nodes of the cluster.
  2. Leader Election mechanism to elect a leader for the cluster and control the cluster at a given time.
  3. Log Replication mechanism to ensure the data safety of the state machine.

If you are interested about this awesome protocol you can find more information from this blog

Failure Detector

Raft consensus consist of heartbeat mechanism, however it does not contain a proper failure detector. Since the problem is originated by not detecting failures properly a failure detector was designed as an extension to Raft heartbeat mechanism.

The failure detector was designed by keeping correctness in mind. The leader of the cluster keeps a log the health of all the nodes and reduces the health if its not responsive for the heartbeats. When the node health is low, the node will be removed from the cache distribution and will be monitored for certain time before adding back to the cache.

When a node is in the monitoring phase, the leader of the cluster will keep sending heartbeats with RPC calls to the node that is being monitored. This monitor also acts as a advanced circuit breaker for detecting failures. All the new cluster members and the recovered nodes will be health checked before adding in the cache distribution.

Cache Entry Distributor

Several factors were taken into account when the cache entry distributor was designed. An extended version of Consistent hashing algorithm was used for this entry distribution. The Following factors were taken to account when selecting the Consistent Hashing with Bounded Loads algorithm.

Consistent Hashing illustration by vitalflux
  1. Write and Read Performance

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to scale without affecting the overall system.

2. Replication

Entry replication is a crucial factor in any distributed environment. Consistent hashing provides a simple replication mechanism that replicates data without duplicating in the same node.

3. Data relocation

Consistent hashing provides a way to minimize the number of keys that needs to be relocated when adding or removing servers.

4. Load Balanced

Inconsistency of the data distribution is one of the main disadvantage of consistent hashing. Consistent Hashing with Bounded Loads algorithm enables to populate the entries to the cluster evenly without overloading any of the nodes.

Learning outcomes

As mentioned in the beginning of the blog, I had no prior experience in distributed computing. I’ve learned about CAP theorem, Consensus algorithms, Membership schemes and protocols, Low overhead network protocols like gRPC, Message broadcasting strategies, Data distribution algorithms, hash functions, fault tolerance, Chaos engineering, event ordering and time, data replication strategies,Cloud design patterns and formal definition languages such as TLA+.

If you are absolutely new to distributed computing, above keywords might be bit overwhelming. You can use the following resources to self study on distributed computing. Once you understand the fundamentals, learning specific algorithms is a very easy task.

You can implement your own distributed cache as a hands on project to put your learning into use. Feel free to evaluate and use other algorithms if you believe its better than what I’ve used. Don’t forget to comment your reasoning in the comments as well.

The Startup

Get smarter at building your thing. Join The Startup’s +725K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store