Redesigning of Ballerina Cache
The motivation behind redesigning of Ballerina cache and its implementation from the scratch
Overview
Cache is a concept we used in programming to improve the performance.
Cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.
~ Source: Wikipedia
Ballerina had a cache standard library v1.0 (ballerina/cache
) from the initial stages. But there were few drawbacks.
- The
ballerina/cache
only supported in memory storage mechanism where cache entries are stored in a map-based data structure as a key, value pair. There was no possibility to extend the persistence storage mechanism into a different in memory storage mechanism or separate storage mechanism such as file, database etc. - The cache module supported LRU eviction policy by default. It was an internal implementation where the user can’t configure. LRU eviction algorithm must be used in all the caching requirements even though it is not the most suitable/optimized eviction policy for all the use cases.
- The cache cleaning up process happened at a pre-defined time interval where the user is not aware of and can’t configure. The user could configure a time duration which the cache entries are valid without accessing. All the cache entries which are not accessed within the time window configured would get removed by a timer whose interval is configured internally.
- There were several issues related to concurrency and performance which cannot be resolved with its design and implementation. Please refer to GitHub issues #19187 #19557 #19487 #20614
These drawbacks were there for some time and because of the existing design any of those couldn’t be fixed. That was the pure motivation behind redesigning of Ballerina cache from the scratch.
Redesign of Ballerina Cache v2.0
Designing a perfect cache is a quiet tricky task since it is needed to be optimized and customized with different use cases. But considering the average case, we focused on few factors.
- Persistence mechanism
- Eviction policy
- Performance and concurrency
Keep in mind those factors and the above explained drawbacks, the redesigning of Ballerina cache standard library has 2 abstract objects called AbstractCache
, AbstractEvictionPolicy
and a LinkedList
data structure to cater the above factors in order.
NOTE: All the Ballerina codes in this article are tested and compatible with Ballerina version 1.2.0
Abstract Cache
The AbstractCache
object which has the common APIs as follows and there can be “custom implementations” with different in memory storages or data storages like file, database etc. with structural equivalent to the abstract object. This provides the extendibility of persistence storage mechanism.
Abstract Eviction Policy
The AbstractEvictionPolicy
object which has the common APIs as follows and there can be “custom implementations” with different eviction algorithms with structural equivalent to the abstract object. That custom implementation must maintain the LinkedList
data structure according to the eviction algorithm. This provides the extendibility of eviction policy.
Linked List
This is a Ballerina implementation of the Linked List data structure. It has following records and APIs.
The New Design
The modern design of cache standard library comes with all the above 3 objects. There should be an implementation of AbstractCache
which is responsible for persistence storage of cache entries such as in memory cache, file cache, database cache etc. Also, there should be an implementation of AbstractEvictionPolicy
which is responsible for eviction of cache entries according to an eviction algorithm such as LRU, MRU, FIFO etc. Finally, there is LinkedList
data structure which maintains the order of the cache entries for the eviction purpose. This will be explained later.
Implementation of Ballerina Cache v2.0
The following diagram demonstrates the default implementation of the modern design. Here, the implementation of AbstractCache
is called Cache
object, which is a map data structure based in memory storage. The implementation of AbstractEvictionPolicy
is called LruEvictionPolicy
object, which is an implementation of LRU eviction algorithm.
The implementation starts with Cache
object initialization by passing following parameters as the cache configurations.
capacity
— Maximum number of entries allowed for the cache.evictionPolicy
— The policy which defines the cache eviction algorithm.evictionFactor
— The factor which the entries will be evicted once the cache is full.defaultMaxAgeInSeconds
— Freshness time of all the cache entries in seconds. This value can be overwritten by themaxAgeInSeconds
property, when inserting entry to cache.-1
means, the entries are valid forever.cleanupIntervalInSeconds
— The interval time of the timer task which cleans the cache entries. This is an optional parameter.
For the better user experience the above configurations are initialized with default values as follows:
Once this is passed for Cache
object initialization, following tasks get executed in order.
- Validate parameters
- Initialize
LinkedList
withnil
values forhead
andtail
. - Start the timer task which cleanup cache entries if the
cleanupIntervalInSeconds
optional parameter is configured.
Now, Cache
object is successfully initialized and ready to use. Then, following tasks get executed whenever there is a Cache
API call, such as get
, put
etc.
The Node
record, which comes from the LinkedList
data structure acts as the container of the data and references to the adjacent data containers.
public type Node record {|
any value;
Node? prev = ();
Node? next = ();
|};
The CacheEntry
record is used as the value
of the Node
. Because, in default implementation a data value has a freshness time, which is governed by the expTime
field. Also, it keeps the key
to the data which is used when invalidating a cache entry. This is explained under the topic “Cache Eviction”.
type CacheEntry record {|
string key;
any data;
int expTime;
|};
After assigning the value
of the Node
with CacheEntry
, an example of the storage structure which is going to store in memory will looks like below.
Node node = {
value: {
key: "key-1",
data: "value-1",
expTime: 1234567890
}
}
This will be inserted to the map data structure against the provided string key
. The key
to the map entry would be a string
and the value
of the map entry would be a Node
of the LinkedList
.
Also, the same storage structure is passed to the respective API of the LruEvictionPolicy
to maintain the LinkedList
data structure. As an example, put
API of Cache
will do the Node
creation, inserting into map data structure and call the put
API of the LruEvictionPolicy
, with the references for the LinkedList
and the Node
created. Then the put
API of the LruEvictionPolicy
will handle the updating of the LinkedList
and Node
with adjacent Node
(s) by calling the LinkedList
APIs.
Since the LruEvictionPolicy
object is an implementation of LRU eviction algorithm, the LinkedList
will be updated as follows.
- When there is a new
Node
to be inserted, it will be inserted to thehead
of the list and chain will get updated accordingly. If theNode
to be inserted is already there in the list, theNode
will be moved to thehead
of the list and chain will get updated accordingly. - When there is a
Node
to be retrieved, it will be moved to thehead
of the list and chain will get updated accordingly. - When there is a
Node
to be deleted, it will be simply deleted from the list and chain will get updated accordingly.
This governs the most recently used (MRU) Node
is always at the head
of the list and least recently used (LRU) will be always at the tail
of the list. When eviction happens, nodes from the tail will be deleted without iterating the map.
Likewise, for each cache operation, the map data structure is updated by the Cache
object and LinkedList
data structure is updated by the LruEvictionPolicy
object.
Cache Eviction
There are 2 mandatory scenarios + 1 optional scenario where a cache entry gets removed from the cache and maintains the freshness of the cache entries. The 2 independent factors which are eviction policy & freshness time of the cache entry, governs the 3 scenarios.
- When using the
get
API, if the return cache entry has expired, it gets removed. - When using the
put
API, if the cache size has reached its capacity, number of entries get removed based on the ‘eviction policy’ and ‘eviction factor’.
** This is where the benefits ofLinkedList
comes into the picture. Since theLinkedList
is updated with each cache operation, we know thattail
of the list has the least recently used entry. So, we must calculate the no of items to be evicted with the equationno of items to be evicted = eviction factor * capacity of the map
and start removing theNode
from thetail
of theLinkedList
. While removing thatNode
we can get thekey
to the cache entry. Using that we can remove theCacheEntry
from the map data structure as well. That is the only reason, why we keep thekey
to the cache entry inCacheEntry
record. - If
cleanupIntervalInSeconds
optional property is configured, the timer task will remove the expired cache entries based on the configured interval.
** For this scenario, we must iterate the map data structure and check for the expired cache entries. The main benefit of using thecleanupIntervalInSeconds
optional property is that the developer can optimize memory usage while adding some additional CPU costs and vice versa. The default behavior is the CPU optimized mode, which is not using timer task.
Release Notes
This implementation is introduced in Ballerina Cache v2.0.0 and it replaces all the existing usages of the Ballerina Cache v1.0.0. This was released with Ballerina v1.2.0.
With this Ballerina Cache v2.0.0 we were able to gain average 3x performance improvement compared to v1.0.0 an resolve all the drawbacks (discussed above) which were there in v1.0.0.
Samples
Cache Initialization
Here are some examples of cache initializations with different use cases.
- A basic example of a cache of 100 capacity, which uses LRU as the eviction policy and eviction factor is 0.25 is as follows:
cache:Cache cache = new;
- A basic example of a cache of 1000 capacity, the eviction factor is 0.2, cache entry default freshness time as 1 hour, and clean up timer configured with 5 seconds interval is as follows:
cache:Cache cache = new({
capacity: 1000,
evictionFactor: 0.2,
defaultMaxAgeInSeconds: 3600
cleanupIntervalInSeconds: 5
});
- An advanced example of a cache which uses a custom eviction policy along with the default capacity, eviction factor, max age and cleanup interval is as follows:
cache:Cache cache = new({
capacity: 1000,
evictionPolicy: new CustomEvictionPolicy()
});
Cache Usages
Here are some examples of using the cache with different use cases.
- The simplest way of using the initialized cache without handling errors.
- The recommended way of using the initialized cache with error handling.
Custom Implementation of Ballerina Cache
The default implementation of ballerina/cache
module provides in memory cache storage based on map data structure and LRU eviction policy. So, it is straight forward to use if your requirements match with that. Otherwise, you must do your custom implementation of AbstractCache
or AbstractEvictionPolicy
or both.
When doing custom implementations, there are few things to consider. Let’s name our custom implementation of AbstractCache
as CustomCache
and custom implementation of AbstractEvictionPolicy
as CustomEvictionPolicy
. The methods implemented in CustomCache
should update the persistence storage as required. Also, it should call the respective API of the CustomEvictionPolicy
as explained before in default implementation. As an example, the get
API of CustomCache
should call the get
API of CustomEvictionPolicy
. It’s a convention to follow to maintain the LinkedList
data structure.
As an example, let’s consider CustomCache
and CustomEvictionPolicy
only with put
API for the simplicity.
The get
API of CustomCache
do whatever the operation related to retrieving the Node
stored in persistence mechanism and should call the get
API of the CustomEvictionPolicy
, which is plugged when initializing the CustomCache
.
The get
API of CustomEvictionPolicy
, should update the LinkedList
which is passed as the parameter, according to the eviction algorithm.
Happy coding with Ballerina !
The design was influenced by https://github.com/chethiya/ballerina-cache by Chethiya Abeysinghe
The GitHub pull request of the implementation:
Related Discussions @ Ballerina Dev Google Group
- Redesign Ballerina Cache API
- Are map operations thread-safe while array operations are not
- Cache Implementation for Ballerina Inbound OAuth2 Authentication