Usage of Hazelcast in WSO2 Identity Server

Sajith Ekanayaka
6 min readMar 26, 2022

--

Logos of WSO2 Identity Server and Hazelcast, wso2.com hazelcast.com

WSO2 Identity Server (WSO2 IS) is an API-driven open source Identity and Access Management (IAM) product designed to help developers build effective CIAM solutions in a short stretch of time.

WSO2 IS is a product that truly scales and its largest deployment manages over 100 million identities (according to public domain sources). The product has to be built with performance in mind to support this kind of larger deployment. In addition to horizontal scaling, WSO2 IS has multiple cache layers which are used to improve performance.

Why caching?

By definition; Caching is the process to store data in a temporary location for a specific period of time. It improves the performance of any type of application.

Like most of the applications out there, WSO2 IS also uses database(s) to store information such as user data, configuration parameters, session information, etc. But reading data directly from the DB all the time reduces application performance due to network latency between DB and application servers, and also the databases tend to perform slower with the growth of data.

What is Caching
What is Caching, www.ezzylearning.net

Caching comes to save application developers from this situation in order to reduce the number of interactions with DB. Caching has certain advantages and disadvantages that you need to evaluate when deciding on your caching strategy.

Advantages

  1. The load on the underlying database or LDAP is reduced as data is served from already fetched data in memory.
  2. Improved performance due to the reduced number of database calls for repetitive data fetching.

Disadvantages

  1. Coherency problems may occur when the data change is not immediately reflected on cached data if one node or an external system updates the database.
  2. Data in memory can become stale yet be served, e.g., serving data from memory while its corresponding record in the database is deleted.

WSO2 IS uses its own Java caching implementation and multiple types of cache layers are written to store data related to different flows. These cache layers support configuring their TTL and capacity for each.

Caching Data Access Strategies: Cache-Aside devaraj-durairaj.medium.com

What is the problem with caching and horizontal scaling?

Applications are scaled horizontally when good performance and reliability are critical. But here comes the problem with caching; the cache coherence problem.

For example, let’s assume there are two WSO2 IS nodes deployed and node A issues an access token and then a token revocation request comes to node B. Still, node A is not aware of this cache revocation event and it still keeps track of the previously issued access token as a valid one in its cache.

Now the old access token will still be valid according to node A even though it is already revoked by node B until the access token cache object in node A expires.

How the coherence problem was solved?

The legacy distributed caching methodology, blog.afkham.org (Now deprecated)

History: In 2013, WSO2 released Carbon Framework v4.2 which had embedded Hazelcast. Hazelcast is an in-memory data grid, which is used in clustering and distributed shared memory. When using Hazelcast as a clustering implementation, data is evenly distributed among the nodes in a cluster.

But later on, it was decided to avoid using distributed shared memory, but keep caches local to each node in the cluster and use messages to cluster members in order to notify about cache invalidations when required. This was due to many practical issues that are related to configuring and running distributed caching properly where the network is not tightly controlled, distributed caching fails in unexpected ways.

There are multiple clustered deployment cache scenarios and the recommended approach is “All caches are local with distributed cache invalidation”.

Local Caching: Enabled
Distributed Caching: N/A
Hazelcast Clustering: Enabled
Distributed Invalidation: Enabled

You can expand the topic “Caching in WSO2 Identity Server” in this document to read about other cache scenarios and their disadvantages.

With the above approach, WSO2 IS employs Hazelcast as the primary method of implementing cluster messages while using distributed caching in a simple setup. Cache invalidation uses Hazelcast messaging to distribute the invalidation message over the cluster and invalidate the caches properly.

A look into WSO2 implementation

The membership schemes are used to keep track of the members in a cluster. The interface HazelcastMembershipScheme is implemented by multiple membership schemes such as WKABasedMembershipScheme, AWSECSBasedMembershipScheme, etc. Then, HazelcastCarbonClusterImpl keeps a list of cluster members with the help of the configured membership scheme.

The core caching implementation of WSO2 is able to allow registering event listeners for cache-related events and invoke them when any type of cache object is updated. Those listeners implement the CacheEntryListener interface and registered via OSGi.

ClusterCacheInvalidationRequestSender is one of those listeners where it puts ClusterCacheInvalidationRequests to a Hazelcast topic with information on which specific cache object needs to be invalidated.

Inside the other nodes of the cluster, the messages on the topic are read and necessary cache objects are invalidated by invoking the execute() method of the ClusterCacheInvalidationRequest objects in the topic.

You may enable debug logs for the following packages in order to observe what’s going on when troubleshooting caching-related issues. The last one requires the hazelcast logging type to be set to log4j as described here.

  • org.wso2.carbon.caching.impl
  • org.wso2.carbon.core.clustering.hazelcast
  • com.hazelcast

Why Hazelcast clustering is a must in WSO2 IS?

If there are multiple WSO2 IS nodes in the deployment, the cache coherence problem is inevitable. One could think of disabling all the cache layers. But it is a huge sacrifice of performance.

What Could Go Wrong? whatcouldgowrongpodcast.com

You may revisit the scenario mentioned under the above topic “What is the problem with caching and horizontal scaling?” as a sample of what could go wrong if Hazelcast clustering is not configured in a multi-node WSO2 IS deployment. It is not only about access tokens, but there are multiple cache layers and it could lead to unexpected scenarios with any of them.

How to configure Hazelcast clustering in WSO2 IS?

This document provides information on configuring Hazelcast clustering in WSO2 IS recent versions (≥ 5.9.0). If you are using v5.8.0 or an older version, the documentation can be found here and here.

A high-level component diagram of a WSO2 IS two-node cluster, is.docs.wso2.com

Additionally, this blog provides a step-by-step on setting up a cluster with the WKA membership scheme on a local machine.

Best practices with Hazelcast clustering in WSO2 IS

  1. Select the suitable membership scheme for your deployment
  2. Configure the shutdown hook and logging type as described in the document
  3. The following properties also can be added under the [hazelcast] line of deployment.toml for better reliability of the cluster. These make sure the cluster is not affected when one of the nodes goes offline suddenly.
'hazelcast.heartbeat.interval.seconds' = "1"
'hazelcast.master.confirmation.interval.seconds' = "5"
'hazelcast.max.no.heartbeat.seconds' = "20"
'hazelcast.max.no.master.confirmation.seconds' = "30"

These property values can be adjusted as per your preference and more information can be found in the Hazelcast documentation.

If you are using an older WSO2 IS version (<5.9.0) which doesn’t have deployment.toml, the following lines can be added by creating a new file called hazelcast.properties inside <IS_HOME>/repository/conf directory.

hazelcast.shutdownhook.enabled=false
hazelcast.heartbeat.interval.seconds=1
hazelcast.master.confirmation.interval.seconds=5
hazelcast.max.no.heartbeat.seconds=20
hazelcast.max.no.master.confirmation.seconds=30

--

--