Redesigning of Ballerina Cache

The motivation behind redesigning of Ballerina cache and its implementation from the scratch

Chanaka Lakmal
Ballerina Swan Lake Tech Blog
9 min readJun 27, 2020

--

Overview

Cache is a concept we used in programming to improve the performance.

Cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.
~ Source: Wikipedia

Ballerina had a cache standard library v1.0 (ballerina/cache) from the initial stages. But there were few drawbacks.

  1. The ballerina/cache only supported in memory storage mechanism where cache entries are stored in a map-based data structure as a key, value pair. There was no possibility to extend the persistence storage mechanism into a different in memory storage mechanism or separate storage mechanism such as file, database etc.
  2. The cache module supported LRU eviction policy by default. It was an internal implementation where the user can’t configure. LRU eviction algorithm must be used in all the caching requirements even though it is not the most suitable/optimized eviction policy for all the use cases.
  3. The cache cleaning up process happened at a pre-defined time interval where the user is not aware of and can’t configure. The user could configure a time duration which the cache entries are valid without accessing. All the cache entries which are not accessed within the time window configured would get removed by a timer whose interval is configured internally.
  4. There were several issues related to concurrency and performance which cannot be resolved with its design and implementation. Please refer to GitHub issues #19187 #19557 #19487 #20614

These drawbacks were there for some time and because of the existing design any of those couldn’t be fixed. That was the pure motivation behind redesigning of Ballerina cache from the scratch.

Redesign of Ballerina Cache v2.0

Designing a perfect cache is a quiet tricky task since it is needed to be optimized and customized with different use cases. But considering the average case, we focused on few factors.

  1. Persistence mechanism
  2. Eviction policy
  3. Performance and concurrency

Keep in mind those factors and the above explained drawbacks, the redesigning of Ballerina cache standard library has 2 abstract objects called AbstractCache, AbstractEvictionPolicy and a LinkedList data structure to cater the above factors in order.

NOTE: All the Ballerina codes in this article are tested and compatible with Ballerina version 1.2.0

Abstract Cache

The AbstractCache object which has the common APIs as follows and there can be “custom implementations” with different in memory storages or data storages like file, database etc. with structural equivalent to the abstract object. This provides the extendibility of persistence storage mechanism.

Abstract Eviction Policy

The AbstractEvictionPolicy object which has the common APIs as follows and there can be “custom implementations” with different eviction algorithms with structural equivalent to the abstract object. That custom implementation must maintain the LinkedList data structure according to the eviction algorithm. This provides the extendibility of eviction policy.

Linked List

This is a Ballerina implementation of the Linked List data structure. It has following records and APIs.

The New Design

The modern design of cache standard library comes with all the above 3 objects. There should be an implementation of AbstractCache which is responsible for persistence storage of cache entries such as in memory cache, file cache, database cache etc. Also, there should be an implementation of AbstractEvictionPolicy which is responsible for eviction of cache entries according to an eviction algorithm such as LRU, MRU, FIFO etc. Finally, there is LinkedList data structure which maintains the order of the cache entries for the eviction purpose. This will be explained later.

Implementation of Ballerina Cache v2.0

The following diagram demonstrates the default implementation of the modern design. Here, the implementation of AbstractCache is called Cache object, which is a map data structure based in memory storage. The implementation of AbstractEvictionPolicy is called LruEvictionPolicy object, which is an implementation of LRU eviction algorithm.

The implementation starts with Cache object initialization by passing following parameters as the cache configurations.

  • capacity — Maximum number of entries allowed for the cache.
  • evictionPolicy — The policy which defines the cache eviction algorithm.
  • evictionFactor — The factor which the entries will be evicted once the cache is full.
  • defaultMaxAgeInSeconds — Freshness time of all the cache entries in seconds. This value can be overwritten by the maxAgeInSeconds property, when inserting entry to cache. -1 means, the entries are valid forever.
  • cleanupIntervalInSeconds — The interval time of the timer task which cleans the cache entries. This is an optional parameter.

For the better user experience the above configurations are initialized with default values as follows:

Once this is passed for Cache object initialization, following tasks get executed in order.

  • Validate parameters
  • Initialize LinkedList with nil values for head and tail.
  • Start the timer task which cleanup cache entries if the cleanupIntervalInSeconds optional parameter is configured.

Now, Cache object is successfully initialized and ready to use. Then, following tasks get executed whenever there is a Cache API call, such as get, put etc.

The Node record, which comes from the LinkedList data structure acts as the container of the data and references to the adjacent data containers.

The CacheEntry record is used as the value of the Node. Because, in default implementation a data value has a freshness time, which is governed by the expTime field. Also, it keeps the key to the data which is used when invalidating a cache entry. This is explained under the topic “Cache Eviction”.

After assigning the value of the Node with CacheEntry, an example of the storage structure which is going to store in memory will looks like below.

This will be inserted to the map data structure against the provided string key. The key to the map entry would be a string and the value of the map entry would be a Node of the LinkedList.

Also, the same storage structure is passed to the respective API of the LruEvictionPolicy to maintain the LinkedList data structure. As an example, put API of Cache will do the Node creation, inserting into map data structure and call the put API of the LruEvictionPolicy, with the references for the LinkedList and the Node created. Then the put API of the LruEvictionPolicy will handle the updating of the LinkedList and Node with adjacent Node(s) by calling the LinkedList APIs.

Since the LruEvictionPolicy object is an implementation of LRU eviction algorithm, the LinkedList will be updated as follows.

  • When there is a new Node to be inserted, it will be inserted to the head of the list and chain will get updated accordingly. If the Node to be inserted is already there in the list, the Node will be moved to the head of the list and chain will get updated accordingly.
  • When there is a Node to be retrieved, it will be moved to the head of the list and chain will get updated accordingly.
  • When there is a Node to be deleted, it will be simply deleted from the list and chain will get updated accordingly.

This governs the most recently used (MRU) Node is always at the head of the list and least recently used (LRU) will be always at the tail of the list. When eviction happens, nodes from the tail will be deleted without iterating the map.

Likewise, for each cache operation, the map data structure is updated by the Cache object and LinkedList data structure is updated by the LruEvictionPolicy object.

Cache Eviction

There are 2 mandatory scenarios + 1 optional scenario where a cache entry gets removed from the cache and maintains the freshness of the cache entries. The 2 independent factors which are eviction policy & freshness time of the cache entry, governs the 3 scenarios.

  1. When using the get API, if the return cache entry has expired, it gets removed.
  2. When using the put API, if the cache size has reached its capacity, number of entries get removed based on the ‘eviction policy’ and ‘eviction factor’.
    ** This is where the benefits of LinkedList comes into the picture. Since the LinkedList is updated with each cache operation, we know that tail of the list has the least recently used entry. So, we must calculate the no of items to be evicted with the equation
    no of items to be evicted = eviction factor * capacity of the map
    and start removing the Node from the tail of the LinkedList. While removing that Node we can get the key to the cache entry. Using that we can remove the CacheEntry from the map data structure as well. That is the only reason, why we keep the key to the cache entry in CacheEntry record.
  3. If cleanupIntervalInSeconds optional property is configured, the timer task will remove the expired cache entries based on the configured interval.
    ** For this scenario, we must iterate the map data structure and check for the expired cache entries. The main benefit of using the cleanupIntervalInSeconds optional property is that the developer can optimize memory usage while adding some additional CPU costs and vice versa. The default behavior is the CPU optimized mode, which is not using timer task.

Release Notes

This implementation is introduced in Ballerina Cache v2.0.0 and it replaces all the existing usages of the Ballerina Cache v1.0.0. This was released with Ballerina v1.2.0.

With this Ballerina Cache v2.0.0 we were able to gain average 3x performance improvement compared to v1.0.0 an resolve all the drawbacks (discussed above) which were there in v1.0.0.

Samples

Cache Initialization

Here are some examples of cache initializations with different use cases.

  • A basic example of a cache of 100 capacity, which uses LRU as the eviction policy and eviction factor is 0.25 is as follows:
  • A basic example of a cache of 1000 capacity, the eviction factor is 0.2, cache entry default freshness time as 1 hour, and clean up timer configured with 5 seconds interval is as follows:
  • An advanced example of a cache which uses a custom eviction policy along with the default capacity, eviction factor, max age and cleanup interval is as follows:

Cache Usages

Here are some examples of using the cache with different use cases.

  • The simplest way of using the initialized cache without handling errors.
  • The recommended way of using the initialized cache with error handling.

Custom Implementation of Ballerina Cache

The default implementation of ballerina/cache module provides in memory cache storage based on map data structure and LRU eviction policy. So, it is straight forward to use if your requirements match with that. Otherwise, you must do your custom implementation of AbstractCache or AbstractEvictionPolicy or both.

When doing custom implementations, there are few things to consider. Let’s name our custom implementation of AbstractCache as CustomCache and custom implementation of AbstractEvictionPolicy as CustomEvictionPolicy. The methods implemented in CustomCache should update the persistence storage as required. Also, it should call the respective API of the CustomEvictionPolicy as explained before in default implementation. As an example, the get API of CustomCache should call the get API of CustomEvictionPolicy. It’s a convention to follow to maintain the LinkedList data structure.

As an example, let’s consider CustomCache and CustomEvictionPolicy only with put API for the simplicity.

The get API of CustomCache do whatever the operation related to retrieving the Node stored in persistence mechanism and should call the get API of the CustomEvictionPolicy, which is plugged when initializing the CustomCache.

The get API of CustomEvictionPolicy, should update the LinkedList which is passed as the parameter, according to the eviction algorithm.

Happy coding with Ballerina !

The design was influenced by https://github.com/chethiya/ballerina-cache by Chethiya Abeysinghe

The GitHub pull request of the implementation:

Related Discussions @ Ballerina Dev Google Group

References

--

--

Chanaka Lakmal
Ballerina Swan Lake Tech Blog

CS PhD Student at UWaterloo | ex-WSO2 (Associate Technical Lead)