Spring Boot Scaling Strategies: Cache

Aleh Zhloba
Level Up Coding
Published in
7 min readDec 5, 2023

--

Photo by Lucas Santos on Unsplash

Scaling out server applications is crucial for maintaining availability and managing traffic spikes. But it’s not that easy and cannot be handled by simply adding more instances.

This is the first article in the series of articles about horizontal scaling. It uncovers blockers that can hinder your backend app’s scalability and explores effective caching techniques in the context of Spring Boot.

Starting point

Assuming we already have a Spring Boot app, it is deployed as a single instance and uses PostgreSQL as its database system. Evolving requirements demand enhanced resilience and the ability to handle increased traffic. To tackle these challenges, we are planning to expand the application with multiple instances using platform like Kubernetes.

A sound strategy, but before proceeding, we must examine the essential question below.

Is your application ready for horizontal scaling?

The answer is NO, if an application instance maintains local state that is not synchronized with other instances.

To understand why, let’s consider this real-life example. Stephen and Randy, teammates, awaited an important meeting with their manager. Randy soon received a text from the boss, notifying him that the meeting had been canceled. He quickly grabbed his jacket and went to the pub. Stephen, meanwhile, stayed at his desk, waiting for the meeting to start because no one had told him the news.

Stephen and Randy. Gifs by Southpark on Tenor

As you may guess, Randy and Stephen represent application instances here. Initially, they had the same state (the meeting is scheduled to happen). But as soon as Randy received the message, his knowledge of the meeting changed, while Stephen’s stayed the same.

This situation can occur in various ways within your application. For example, if a user uploads a photo to one instance and encounters an error while attempting to retrieve it from another.

When instance states are inconsistent, the way your application behaves can change unpredictably. This disrupts its functionality. Inconsistency forces you to grapple with elusive and hard-to-reproduce bugs in a multi-node environment.

So, what can we do to address this challenge?

Know your local state

Local state is our enemy when it comes to server application scaling. First, we need a strategy to handle it.

There are two approaches:

  1. get rid of state on an application level and move it to some shared external service
  2. implement state synchronization mechanism

To grasp the pros and cons of each choice, better to examine them in the context of specific application parts.

Cache

In this article we will focus on cache, as the most common local state habitat.

A cache is a temporary storage that holds frequently used data. It helps boost performance by reducing the need to fetch the same data from the original source, like a database or external API.

Single-instance applications usually use local cache as the most efficient solution, but it can’t be used as is in a multi-instance deployment due to an inconsistency problem.

In Spring Boot, cache is typically handled by CacheManager and Hibernate second-level (L2) cache abstractions. We will consider the first one because the basic principles for both of them are similar.

Approach 1: External cache

To start, let’s remove any local cache providers we have and replace them with shared external storage.

All application instances will access the same data by storing it in one place. It doesn’t matter which instance handles a request. This approach replicates the behavior of a single-instance deployment, eliminating inconsistencies arising from local caches.

An external cache diagram.

Our first choice for a shared external cache is Redis, as a proven universal in-memory storage solution, which is well-suited for both small and large applications. But there are other options to consider.

Add the dependency:

implementation("org.springframework.boot:spring-boot-starter-data-redis")

Spring Boot will automatically configure the Redis implementation of CacheManager with default settings. To customise the configuration, you need two more beans:

@Bean
fun redisCacheConfiguration(): RedisCacheConfiguration =
RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(60))

@Bean
fun redisCacheManagerBuilderCustomizer(): RedisCacheManagerBuilderCustomizer =
RedisCacheManagerBuilderCustomizer { builder ->
builder
.withCacheConfiguration(
"venues",
RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(30))
)
.withCacheConfiguration(
"customers",
RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(5))
)
}

With that, all that’s left is setting up your Redis deployment and configuring the connection properties in the application.yml file.

Let’s now explore the second approach and compare their outcomes.

Approach 2: Synchronized local caches

Can we remain a fast local cache but scale the app horizontally?

To keep our application caches in sync, we need a way to send messages to all running instances of the application. This can be done using a message bus, which allows us to broadcast notifications to all connected nodes.

Synchronized local caches diagram.

There are many different message bus systems available. Spring Boot works well with AMQP brokers, Redis, Pulsar, Kafka, bunch of cloud messaging solutions, and many more. Each of them has it’s own characteristics and best use cases, but we won’t delve into them in this article.

Because our application uses PostgreSQL as a database system, we can utilize its built-in notification system to have a basic message bus. Examples below use my own library built on top of the PostgreSQL LISTEN/NOTIFY feature. It provides us with real-time messaging without introducing additional complexity to the infrastructure.

Now, let’s return to the main goal and create our own Spring Cache interface implementation. It will delegate all work to the local cache. Additionally it tells other instances of the app when cache entries must get removed. This is achieved by broadcasting CacheNotification messages of two types:

  1. Evict: evict cached value for the specific key
  2. Clear: clear cache removing all its values
class SimpleDistributedCache(
private val underlying: Cache,
private val messagingTemplate: PostgresMessagingTemplate,
) : Cache by underlying {
companion object {
const val CACHE_CHANNEL = "cache"
}

override fun evict(key: Any) =
underlying.evict(key).also {
// broadcast cache evict message after underlying local cache eviction
messagingTemplate.convertAndSend(
CACHE_CHANNEL,
CacheNotification.Evict(
cacheName = name,
key = key
)
)
}

override fun clear() =
underlying.clear().also {
// broadcast cache clear message after underlying local cache cleared
messagingTemplate.convertAndSend(
CACHE_CHANNEL,
CacheNotification.Clear(cacheName = name)
)
}
}

Also we need to write our CacheManager:

class SimpleDistributedCacheManager(
private val underlying: CacheManager,
private val messagingTemplate: PostgresMessagingTemplate
) : CacheManager by underlying {
private val logger = KotlinLogging.logger {}

override fun getCache(name: String): Cache? =
underlying.getCache(name)?.let { cache ->
SimpleDistributedCache(cache, messagingTemplate)
}

@PostgresMessageListener(value = [CACHE_CHANNEL], skipLocal = true)
fun handleNotification(notification: CacheNotification) {
try {
underlying.getCache(notification.cacheName)?.let { cache ->
when (notification) {
is CacheNotification.Clear -> cache.clear()
is CacheNotification.Evict -> cache.evict(notification.key)
}
}
} catch (e: Exception) {
logger.error(e) { "Error during processing distributed cache notification: $notification" }
}
}
}

The most important part of the code above is the handleNotification method. This method receives updates from other instances of the application and applies them to the underlying caches. The @PostgresMessageListener annotation makes the method a message listener for the channel. The skipLocal = true attribute ensures that messages from the same instance are ignored.

If you pick another messaging solution, you must use the right annotations like @RabbitListeneror @SqsListener for RabbitMQ and AWS SQS respectively.

Make sure to preserve the cache key data type when broadcasting eviction messages, or instances won’t clear their caches. For example, JSON (commonly used message serialization format) doesn’t distinguish between Int and Long data types. To tackle this issue, use strings for cache keys or customize message serialization.

Finally, let’s instantiate our custom CacheManager using Caffeine as an underlying local cache:

implementation("com.github.ben-manes.caffeine:caffeine:3.1.8")
@Bean
fun cacheManager(messagingTemplate: PostgresMessagingTemplate): CacheManager =
SimpleDistributedCacheManager(
messagingTemplate,
CaffeineCacheManager().apply {
setCaffeine(Caffeine.newBuilder().expireAfterWrite(10, TimeUnit.MINUTES))
}
)

And that’s it, we created a simple cache synchronization mechanism that combines local cache and horizontal scaling benefits. It’s pretty cool, right?

Unfortunately, we must take into account some limitations:

  1. There is always a delay before all instances get a synchronization message. This means clients can still see old data for a short time.
  2. Memory used by local cache is limited within instance/container running your application.
  3. If you use a messaging system that provides ‘at-most-once’ delivery guarantee (like PostgreSQL or Redis pub/subs), be aware that there is a risk of messages being lost due to network issues. Some instances may not evict cache as expected.

So it’s time to summarise everything we have learned so far.

What strategy to choose?

A shared external cache is our default choice for horizontally scaled server applications. It provides strong consistency of cached data across instances, is highly scalable, and reduces the load on the “source-of-truth” database more effectively than local caches. It’s worth noting that modern in-memory storage solutions like Redis or Hazelcast offer more than just caching. They can greatly enhance your system’s capabilities.

But if you have significantly more reads than writes and are fine with occasional data inconsistency, use synchronized local caches. The local cache is generally faster than the shared one because it doesn’t need network communication. This means users of your app will experience lower latency.

Tips

  • Consider a hierarchical multi-level caching solution by combining the two approaches. It will improve cache performance and reduce database load.
  • Review how ConcurrentHashMap is used in your application, as it is often employed as a form of local cache.
  • In some cases, load balancers with sticky sessions can improve local cache efficiency. They do this by reducing cache misses and fixing temporal inconsistencies.

Conclusion

To scale horizontally, it’s crucial to know what application data can stay local and what must be shared.

Throughout this article, I have shown general strategies for dealing with instance local state. Additionally, we have gained an understanding of how to implement an efficient caching layer, enabling you to scale out your Spring Boot application and achieve the desired characteristics.

In upcoming articles, we will consider how to manage WebSockets and scheduled tasks for multi-instance backend applications.

--

--