Distributed Ehcache on Kubernetes

Upstream

Published in

Upstream Engineering

7 min readMay 12, 2021

Written by Dimitris Koukis

Introduction

Traditionally, in Upstream we use Ehcache as a second level cache of Hibernate for all our java based projects. There are multiple reasons for this but mainly it is the simplicity of configuration (even in a clustered environment) and the performance of this solution, as well as the fact that when we took that decision there were not many alternatives to consider. Apart from the second level cache, we also use Ehcache for application level caching requirements, taking advantage of its distributed capabilities, using it as a synchronization method between the various application nodes in our cluster. Together with the application we also include a set of endpoints to inspect and reset the cache (for troubleshooting and investigation tasks)

Our configuration is relatively simple, we define a series of cache regions for the Hibernate second level cache and also extra regions for any ad-hoc caching requirements. We use the RMI based factories to enable the cache replication over the cluster, meaning all Ehcache nodes in the cluster communicate with each other using RMI. Nodes in the cluster are called peers and communication between them happens using a CacheManagerPeerListener. The way Ehcache figures out which nodes are part of the cluster is with an EhcachePeerProvider. Here is an example configuration:

Sample ehcache configuration

Kubrnetes Era

When we moved to Kubernetes as our production environment, we faced a serious problem: Ehcache relies on network multicast functionality to dynamically discover the application nodes in the cluster. Enabling multicast was not an option in our Kubernetes installation due to the fact that we use Calico as the network service, which currently does not support multicast (it is on the roadmap but not yet implemented). Therefore we had to search for a different approach, performing the discovery of the nodes at the application level rather than rely on the network.

The other solution was to move to a different cache provider that had native support for Kubernetes environments at that time (e.g. Hazelcast). Products that were developed at that time within the company followed that approach. In the case of existing products however, that would mean that we would have to change the Hibernate configuration and modify deployment scripts, but even worse we had to change the application itself where Ehcache was used as an ad-hoc distributed cache. Therefore, for our existing products we decided to tackle the peer discovery problem without relying on multicast.

Solution overview

The idea was to create a custom Ehcache CacheManagerPeerProvider. As mentioned above, we use RMI for cache replication, meaning peers communicate with each other using RMI. So what we needed was a custom implementation of an RMICacheManagerPeerProvider. Ehcache provides two implementations out of the box: the ManualRMICacheManagerPeerProvider, where the list of peers is provided statically (as configuration) and the MulticastRMICacheManagerPeerProvider, where the list of peers is updated dynamically using multicast. We created another implementation which would dynamically discover peers through a different channel rather than the network. Since all nodes share the same application database, our thought was that during initialization, the network address of the node will be stored in a database table which would periodically be read so that we have a complete and updated list of the application nodes. Then our custom peer provider would discover the cluster nodes using this table.

Implementation

The implementation consists of two sections. The first section isn't tied to Ehcache at all. Its main class is ClusterNodeManager which is responsible to keep track of the available application nodes in the cluster as well as to notify any interested components for any changes in the cluster state using the Observer pattern. For this reason we have created the ClusterNodeListener interface, which must be implemented by observer components. Keeping the list of nodes up to date is achieved by periodically reading and writing to the database utilizing two schedulers. The first one, writes the state of the current node in the database. Apart from updating its own state, it also removes any stale records from the database, based on the record timestamp. This way, if a node fails to unregister itself while shutting down, it will be removed from the cluster after a while due to inactivity.

The second scheduler reads the database and in case the list of nodes has changed (a new node has been added or a node has been removed) since the last read, it updates the internal state of ClusterNodeManager (the list of available nodes in the cluster). It then asynchronously notifies all observers for that change, making them aware of the updated node list.

ClusterNodeManager.class

The second section of the implementation is the Ehcache extension. In order to make Ehcache aware of the cluster updates, we created a custom peer provider by extending the RMI peer provider implementation of Ehcache (RMICacheManagerPeerProvider), similarly to the existing Manual and Multicast implementations, which also extend RMICacheManagerPeerProvider. Our implementation is called DatabaseRMICacheManagerPeerProvider, as peer discovery is achieved using the nodes list we have from the ClusterNodeManager. More specifically, DatabaseRMICacheManagerPeerProvider acts as an observer of the ClusterNodeManager (implements the ClusterNodeListener interface). Therefore, upon a change in the available nodes the onClusterNodeChange() method is called which updates the internal list of peerUrls in the peer provider. For each available node, we call registerPeer() for each cache associated with this peer provider, adding the new URL in peerUrls. We have implemented the abstract registerPeer() method to add the node to the peerUrls list, similar to the existing Manual and Multicast implementations. Then we have also implemented the abstract listRemoteCachePeers() method, to return the available peers based on these data. We also perform a liveness check for the remote peers before adding or returning them (using the existing lookupRemoteCachePeer() method).

DatabaseRMICacheManagerPeerProvider.class

To put the two pieces of the implementation together, upon application initialization, we identify all configured peer providers from the various Ehcache regions and register them to ClusterNodeManager in order to receive the cluster state updates.

ClusterNodeListenersInitializer.class

Finally we have a peer provider factory that creates the new custom peer provider. This implementation is used to define the cacheManagerPeerProviderFactory in Ehcache configuration, replacing the out-of-the-box Ehcache peer provider factory that we had.

DatabaseRMICacheManagerPeerProviderFactory.class

There are no further configuration changes required, making the upgrade process rather easy.

Sample Ehcache configuration using the custom peer provider

Integration with applications

We bundled our solution as an independent library, containing the functionality of the new peer provider and dependencies. In our applications that use the library, we use Spring framework to inject the required resources to the library. That would include the datasource to access the database table as well as other configurations like node idle time, read/write interval etc. This makes our solution database independent and we were able to use it against various implementations (Postgres, Cassandra and Redis), depending on the available database of the module.

Roll out

The solution worked like a charm in all products that used it. We didn’t notice any problem related to performance or loss of data despite the fact that we were in a high traffic environment but also dynamic, meaning that we often had nodes added/deleted based on the traffic as well as frequent rolling deployments.

Data integrity bug

We came across a major issue about two years after our first roll out. Our applications have multiple installations, each one with unique configuration and data. A typical installation would include at least two nodes for availability reasons which would scale up based on the incoming traffic.

While investigating a bug, we found out that the contents of the cache of an application node were not the expected compared to the application configuration but rather contained configuration entries that belonged to a different installation. Our investigation concluded that this was an edge case where an application node performed a forced shutdown without being able to run the clean up method that removed the application from the node list in the database. Before another node was able to remove this entry due to inactivity, a new application of a different installation was deployed and was assigned the same network address as the shutdown node. As a result, this node started receiving cache updates of a different installation. In order to fix it, we set the Ehcache RMI listener port (meaning the port of RMICacheManagerPeerListener) to a unique value per application installation, as well as per version of the application, so that if we ever come across the same situation, the Ehcache nodes will only communicate with peers of the same installation and version, as others would operate on a different port.

Further improvements

Since we have introduced Consul as our service discovery mechanism in Kubernetes, we are working on the direction of replacing the functionality that identifies the cluster nodes. Instead of reading and writing the IP addresses to a database, we use Consul services to discover the nodes belonging to the specific installation and use that list to populate the peerUrls of the peer provider. Upon any update in the nodes list, we update the cache using the same functionality as described above.

One more improvement would be the upgrade to Ehcache version. Our solution is based on version 2.x and in new versions (3.x) the clustering support has a totally different implementation. Most probable, the upgrade would require a major re-writing to use the new functionality.

Acknowledgments

This article is based on the work made by Nikos Kotzalas and Giannis Mitropoulos.