Effective and highly available cache: Integrating Spring Boot with Redis

Published in

McKinsey Digital Insights

7 min readApr 4, 2023

By Mohamad Assaf — Senior Analyst, Build by McKinsey; Brian Leke — Senior Expert and Associate Partner, Build by McKinsey; and Sallah Kokaina — Senior Expert and Associate Partner, Build by McKinsey

Offering fast and high-availability solutions is becoming the new norm in the field of software engineering. With the growing demands on web and mobile channels, developers can leverage open-source solutions to shape production-ready setups.

During our own practice, we have found it valuable to adopt a ‘shift-left’ approach that puts performance concerns at the core of development activities. This is especially important when it can affect end-user experience and minimize the impact of growing digital initiatives.

In this article, we will explain how implementation can be achieved on a high-availability container cluster using OpenShift to scale a Redis cache consumed by a sample microservice written with Spring Boot. Above all, we will help you avoid the pitfalls of data inconsistencies that can occur when using caching.

Using caching can stop system users from being affected by a slow response of downstream components for data that is static, but it often presents the following challenges:

Time-to-live — An object is cached for a specific period, usually based on the frequency of refresh or change. This requires correct configuration to ensure that data is available for the appropriate period of time before refreshing.
Performance — Caching is used to speed up response times. Using the right caching practices is key to achieving the best performance.
Availability — The cache may be used for operational tasks that need a quick response. Ensuring the availability of our cache will prevent any disturbance to our application.

It’s important to bear these issues in mind when integrating a Redis cache with a Spring Boot application.

*Figure 1 illustration of the integration. The backend service will call third-party APIs and use Redis to cache the response.*

To kick-start a Spring Boot application, we can start by using spring initializr: https://start.spring.io/

Integration with Redis

We recommend using a Spring Boot starter to integrate with Redis, by adding the spring-boot-starter-data-redis dependency to your project’s pom.xml file. You can also add the dependency if you are using Gradle as a build tool.

Maven

<dependency> 
    <groupId>org.springframework.boot</groupId> 
    <artifactId>spring-boot-starter-data-redis</artifactId> 
</dependency>

Gradle

implementation 'org.springframework.boot:spring-boot-starter-data-redis:2.2.0.RELEASE'

Specify the connection details in the application.properties file or the application.yml file using the following properties:

spring.redis.host = <host_url_for_redis>

spring.redis.port = <port_for_redis_instance>

spring.redis.password = <password_for_default_redis_account>

Spring will then leverage boilerplate code to establish the connection. To test the above configuration, you can use the official docker image for redis:

(docker run -p 6379:6379 -d redis:6.0 redis-server — requirepass “testPassword”)

Then add the following configuration in the properties file:

spring.redis.host = localhost

spring.redis.port = 6379

spring.redis.password = testPassword

Most articles for configuring Redis with Spring Boot suggest using the @Cacheable annotation on the main class. While this does automatically configure caching in the application, we recommend implementing caching manually, by using a caching service. This will give us the most control over all the details of our cache, such as control TTL and the structure of the Redis cache we use (Redis Strings, Redis Hashes, etc.) Below is an example of our service:

@Component 
@RequiredArgsConstructor 
public class RedisService { 
    private final RedisTemplate<Object, Object> redisTemplate; 
 
    public <V> void set(String cacheName, String key, V value, long timeout, TimeUnit timeUnit) { 
        redisTemplate.opsForValue().set(cacheName+key, value); 
        redisTemplate.expire(cacheName+key, timeout, timeUnit); 
    } 

 

    public <V> V get(String cacheName) { 
        return (V) redisTemplate.opsForValue().get(cacheName); 
    } 
 
    public Boolean hasKey(String cacheName, String key) { 
        return redisTemplate.hasKey(cacheName+key); 
    } 
}

For performance gains over the default Java serializer, and to make it easy to inspect and debug the cache in Redis, we configure a custom serializer/deserializer. We will use a default string Serializer/Deserializer for the key, and Jackson to serialize objects. We can then configure our RedisTemplate bean using the below:

@Slf4j 
@Configuration 
@Profile("!test") 
public class RedisConfig { 
    @Bean 
    public RedisTemplate<Object, Object> redisTemplate(RedisConnectionFactory connectionFactory) { 
        RedisTemplate<Object, Object> redisTemplate = new RedisTemplate<>(); 
        try { 
            redisTemplate.setKeySerializer(new StringRedisSerializer()); 
            redisTemplate.setDefaultSerializer(new GenericJackson2JsonRedisSerializer()); 
            redisTemplate.setConnectionFactory(connectionFactory); 
        } catch (Exception e) { 
            log.error("Error getting Redis Template connection ", e); 
        } 
        return redisTemplate; 
    } 

}

Time-to-live

Redis allows TTL configuration on the cache. If we wish to configure TTL on the entry level, in order to expire a specific object based on a refresh time, then we can use Redis strings, which are simple key-value pairs. On the other hand, if we used Redis hashes, then all fields inside the hash will be deleted when TTL expires.

Take a look at the function set in our RedisService:

  public <V> void set(String cacheName, String key, V value, long timeout, TimeUnit timeUnit) { 
        redisTemplate.opsForValue().set(cacheName+key, value); 
        redisTemplate.expire(cacheName+key, timeout, timeUnit); 
    }

It takes timeout (an integer) and timeUnit (TimeUnit.Hours, or Seconds, as a unit to the timeout) as parameters. When we insert a new entry, RedisTemplate will use the cache as a key-value pair, where the key is the cache name concatenated with the key, and the value will be the serialized class using Jackson. Then, we will set the expiry duration for that cache.

Performance

Referring to https://redis.io/docs/management/optimization/memory-optimization/, we can conclude from the above that a Redis hash map with few fields will behave faster than if it had a few Redis strings. We can demonstrate this with the practical example below.

Consider case 1, a Redis hash called Object:

Object: {

1: test1,

2: test2,

3: test3

}

And case 2, where we have 3 Redis strings with keys Object1, Object2, Object3:

Object1: test1

Object2: test2

Object3: test3

Using case 1 is faster than case 2.

The reason is that Redis transforms the hash map into an array, which leverages CPU cache memory more effectively. So categorizing our entries into smaller hashes will leverage the Redis capabilities and speed up our response time. Here’s an example of how we’ve implemented this technique.

Context:

We had 30,000 customers whose information was cached in Redis. The TTL was 18 hours.

Straightforward caching strategy:

For each client, we had to insert a Redis string (key-value pairs) and cache it, leaving us with 30,000 Redis strings.

Example for a customer id = 100:

customerDetails100: {CustomerDetailsObject}

Leveraging the capabilities of Redis:
In this example a set of customers was mapped into a specific application user. On average, every user had 30 customers. Every time the user logged in, all customer details were fetched (from the cache). We had to structure our cache better to improve the overall performance. We achieved this using Redis hashes:

For a user id = A01, for client IDs 100 to 129

customerDetailsA01: {

100: {CustomerDetailsObject}

101: {CustomerDetailsObject}

102: {CustomerDetailsObject}

…

}

Overall, the set of 30,000 clients was divided into 1,000 sets of 30 clients each, and since all the set is filled automatically when the user logs in, we could configure TTL on the level of the Redis hash.

Availability

Redis may become an important system in our application for speedup and perhaps for critical operations like invalidating JWT tokens or responding to brute-force login.

In this case, when Redis goes down, it would make our application unavailable, so we have to ensure the availability of Redis. Below is a brief explanation on how we can avoid this in the two links below:

Redis provides a setup for high availability, using the Redis Sentinel https://redis.io/docs/management/sentinel/

For a way to deploy Redis to Kubernetes

https://www.containiq.com/post/deploy-redis-cluster-on-kubernetes

We used OpenShift to host our application. In this context, we can deploy Redis as a StatefulSet with two replicas. On startup, we can configure one of the instances to be a replica of the other. We can deploy Sentinel using the following:

apiVersion: apps/v1 
kind: StatefulSet 
metadata: 
  name: sentinel 
spec: 
  serviceName: sentinel 
  replicas: 1 
  selector: 
    matchLabels: 
      app: sentinel 
  template: 
    metadata: 
      labels: 
        app: sentinel 
    spec: 
      initContainers: 
      - name: config 
        image: 10.34.174.71:8082/redis:7.0.5-alpine 
        resources:  
          limits: 
              cpu: "200m" 
              memory: "256Mi" 
          requests: 
              cpu: "100m" 
              memory: "128Mi" 
        command: [ "sh", "-c" ] 
        args: 
          - | 
            REDIS_PASSWORD=<password> 
            nodes=redis-0.redis,redis-1.redis,redis-2.redis  
            for i in ${nodes//,/ } 
            do 
                echo "finding master at $i" 
                ROLE=$(redis-cli --no-auth-warning --raw -h $i -a $REDIS_PASSWORD info replication | awk '{print $1}' | grep role: | cut -d ":" -f2) 
                if [ "$ROLE" == "master" ]; then 
                    echo "found master at $i" 
                    MASTER=$i 
                    break 
                else 
                    echo "no master found" 
                    MASTER= 
                    break 
                fi 
            done 
            echo "sentinel monitor mymaster $MASTER 6379 1" >> /tmp/master 
            echo "port 5000 
            sentinel resolve-hostnames yes 
            sentinel announce-hostnames yes 
            $(cat /tmp/master) 
            sentinel down-after-milliseconds mymaster 5000 
            sentinel failover-timeout mymaster 60000 
            sentinel parallel-syncs mymaster 1 
            sentinel auth-pass mymaster $REDIS_PASSWORD 
            " > /etc/redis/sentinel.conf 
            cat /etc/redis/sentinel.conf 
        volumeMounts: 
        - name: redis-config 
          mountPath: /etc/redis/ 
      containers: 
      - name: sentinel 
        image: 10.34.174.71:8082/redis:7.0.5-alpine 
        resources: 
          limits: 
            cpu: "200m" 
            memory: "256Mi" 
          requests: 
            cpu: "100m" 
            memory: "128Mi" 
        command: ["redis-sentinel"] 
        args: ["/etc/redis/sentinel.conf"] 
        ports: 
        - containerPort: 5000 
          name: sentinel 
        volumeMounts: 
        - name: redis-config 
          mountPath: /etc/redis/ 
        - name: rdata 
          mountPath: /data 
      volumes: 
      - name: redis-config 
        emptyDir: {} 
      imagePullSecrets: 
        - name: regsecret           
  volumeClaimTemplates: 
  - metadata: 
      name: rdata 
    spec: 
      accessModes: [ "ReadWriteOnce" ] 
      storageClassName: "standard" 
      resources: 
        requests: 
          storage: 1Gi

In the Init container, Sentinel will scan the Redis instances, find the master and monitor it. It will automatically discover the replica and will promote it to master anytime the master is unavailable.

Conclusion

Configuring effective caching can be a challenging task and when not done properly can lead to bugs, such as outdated data and reduced performance. We have provided a way to approach caching that satisfies business requirements regarding frequency of refresh, and availability to prevent system outages. We have also demonstrated how to use the intrinsic capabilities of Redis to improve performance through the structure of cached data. Follow the advice laid out in this article to speed up your implementation and application responsiveness. Please refer to the full source code and Readme on: https://github.com/mohamadassaf96/spring-redis-poc