The Essential Guide to Caching in Microservices Architecture

Mehmed Ali Çalışkan
Hexaworks-Papers
Published in
14 min readOct 27, 2023

Caching emerges as a critical architectural element in web applications with high traffic volumes. Positioned primarily between the database and business modules, caching layers are crucial for performance. However, if not designed correctly, they can degrade an application’s performance. In microservices-based application architectures, not only client requests but also inter-service data traffic must be considered. Therefore, it’s essential for microservices to be supported with the right caching layers.

Understanding Caching Imperatives: Navigating Data Access Overheads

In our digital age, the rapid and efficient retrieval of information remains paramount. As applications grow in complexity and user bases expand, the time taken to access data can become a significant bottleneck, negatively impacting the user experience and system efficiency. Enter caching — a technique employed to store frequently accessed data in ‘near’ storage locations, thereby reducing the time required to fetch it. At the core of this approach are the challenges associated with accessing data, which can be broadly classified into Local and Remote Access Costs. By comprehending these costs and their implications, we can discern the inherent need for caching and its pivotal role in optimizing system performance.

Local Access Costs (LAC)

One of the primary costs encountered when accessing any data is the time taken to query and retrieve results between the application code and the database server. This cost depends on multiple criteria, such as the size of the data, its relational complexity in the database, the richness of the database indexes, and the technological capabilities of the database. Engineering approaches related to these criteria are the means to avoid this cost without resorting to caching. However, most often, to mitigate the time expense within the database, we can cache the data we obtain from the database in other environments. By doing so, we sift the data from a larger dataset, consolidate relational data gathered from different tables, and avail the opportunity to make the data accessible in a physically faster environment than the database. LAC is fundamentally a bigger issue in monolithic applications. In microservices, as the data size reduces and relationships decrease, this cost proportionally diminishes. However, microservices introduce other associated costs.

Remote Access Costs (RAC)

One of the prominent challenges in a microservice-based architecture is related to merging data spread across different services when they are required to be processed in relation. For example, an invoicing service that produces a bill for a customer’s purchase might need data from the product service to retrieve the names of the products on the invoice. Developing solutions for such scenarios is among the general responsibilities of microservice architectural design. The network query costs encountered when accessing data from another service are referred to as Remote Access Costs. In any scenario, RAC can encompass some degree of LAC. A remote service might have reduced its LAC by implementing its caching solutions, but the client service will still bear an additional network transfer cost when accessing that data. RAC is typically dependent on network technologies, and this cost can be reduced by using communication protocols like HTTP 2.0 or gRPC instead of standard HTTP communication. However, architectural solutions based on caching should also be developed to lower RAC. To cope with RAC and, independent of data media access improvements, the following two methods are recommended to reduce communication and network transfer costs.

Data Replication

If the need for data between microservices is very intense and the frequency of data access is very high, one microservice can replicate the data of another microservice in its own database. In doing so, we eliminate the RAC (Remote Access Cost). For data replication, we can use native replication tools of databases or various third-party software developed for replication. Alternatively, by listening to message broker events directly, we can keep an up-to-date copy of the remote data in our own database through our code.

Middleware Caching

If the data managed by one microservice is needed by many other microservices, data replication will be costly in terms of storage space. In such cases, we can keep the data available in a central location. This approach may seem monolithic to you, but Middleware Caching and using a common database are very different concepts. Firstly, middleware caching tools expose the data to other services solely for reading purposes, promising high performance in reading speed. This is because they are both read-focused technologies and they don’t store relational data, which allows them to scale easily, potentially providing fast responses from dozens of clusters if necessary. Middleware caching significantly reduces RAC (Remote Access Cost) and almost eliminates LAC (Local Access Cost).

Physical Tools for Caching

Application Memory

Your software module can generate various native caching data as permitted by the programming language and use them as a caching layer between the module and the database. Typically, objects created by our commonly used ORMs (Object Relational Mappers) can be retained instead of being destroyed and used as alternatives to secondary database access. Instead of ORM objects, you can also use your own objects, arrays, or linked lists. While application memory isn’t the most secure layer for caching, it can be suitable for minor needs. For example, when the application starts, you can access the database storing country phone codes once and continuously keep those codes in your memory as an array of objects. Memory is an expensive and limited resource, so good planning is required before deciding what to retain in memory. For this, consider the size of the data and its access frequency. When you divide the access frequency by the data size, the resulting coefficient can give you insight into the necessity of backing up the data in memory.

Disk Storage

Your software module might want to save integrated data, obtained from accessing various files in the database or on its disk, to the disk for future reuse. Accessing the disk isn’t as fast as accessing memory, but accessing databases is a type of disk access. Therefore, a result produced from many complex queries and operations can be saved back to the disk (or even to a separate table in the database) to save time from these queries and operations in the future. Older developers used to save HTML pages, which they obtained by merging database data and design templates, to the disk for a reasonable period for reuse. While we see less use of this technique nowadays, it’s still very useful when there’s a justified reason. For instance, you can store the PDF-based reports you produce on your disk and allow them to be downloaded again as long as the data they contain remains current. Of course, various procedures, from simple to complex, might be needed to ensure data freshness. We’ll discuss these in the strategies.

Another use case for the disk is as follows: When we position disks of various speeds on our server, you can group cache data based on their access frequencies. You can write the most frequently accessed data to memory, medium-frequency data to fast disks, and low-frequency data to slow disks. Everything is about strategic thinking.

Databases

Serving as the “source of truth” that holds the most recent data, databases can also be used to store cache data. Instead of gaining efficiency in database access time, we sometimes save processed data back to the same database to save on query and processing time. When the time comes, we use this processed data instead of raw data. For instance, sometimes document-based no-SQL databases can be designed as caching layers that store combined data queried from the intricate tables of relational databases. We can even sometimes use the relational database itself to store data that has been processed further, across multiple levels. Of course, a crucial aspect to pay attention to before using processed data over raw data is to ensure that the processed data remains synchronized and up-to-date with the raw data

Redis

Redis, an acronym for Remote Dictionary Server, stands as one of the most renowned in-memory data structure stores, primarily utilized as a database, cache, and message broker. The pivotal advantage of Redis lies in its ability to offer ultra-fast read and write operations on in-memory datasets. This gives it an edge for applications requiring rapid access to data, making it a popular choice for caching. Redis supports diverse data structures such as strings, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs, and geospatial indexes. Furthermore, its ability to replicate data across multiple instances and its support for persistent storage make it a versatile tool that can cater to both transient caching and more persistent storage needs.

Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine built on top of the Apache Lucene library. While it’s commonly associated with log and event data analytics, it is also an incredibly efficient tool for caching aggregated or computed data. One of Elasticsearch’s standout features is its capability to conduct real-time searches over vast amounts of data. This makes it a suitable choice for caching data that needs to be accessed in intricate and varied patterns, not just key-value lookups. It is also particularly effective for data that benefits from full-text search capabilities or requires real-time analytics. The underlying structure, which stores data in indexed documents, can be updated or appended to, ensuring cached data remains relevant. Its distributed nature provides scalability, ensuring consistent performance even as data grows.

Caching Types

Entity Caching

Entity caching centers around the preservation of individual entities in their most raw and unprocessed form, just as they are represented in their native database. As the name suggests, this cache type focuses on standalone entities like products, users, or categories. Often, the primary purpose of entity caching is to retain frequently accessed entities directly in memory, facilitating quicker retrieval. When stored in a middleware tool like Redis, these entities become accessible to all microservices, ensuring unified and expedited data access across the system.

Contrary to other caching types, entity caches are devoid of aggregated or transformed data, holding only the pure, unaltered information. Such a cache is particularly valuable when the genuine data of an entity suffices, without the need for any relational aggregation. The predominant rationale for housing entity caches in memory revolves around cost-efficiency. Given that the cached data is generated without incurring relational aggregation expenses, relegating it to a disk-based system wouldn’t offer a significant advantage over simply fetching it from the original database.

Typical candidates for entity caching include categories, system users (provided their numbers aren’t overwhelmingly large), brands, countries, and cities. As these examples suggest, entity caching is most beneficial when the data set isn’t too expansive (to avoid memory constraints), but the information is essential for multiple services and demands frequent access.

Accessing Entity Caches

Predominantly, entity caches are accessed using a unique identifier, often referred to as the “ID key.” This straightforward mechanism ensures that data retrieval is efficient and exact, typically fetching a single entity per request. However, the capabilities of entity caching extend beyond just individual data retrieval. By implementing additional indices within the memory structure, it’s possible to retrieve multiple entities that match a specific index value. For instance, while the primary cache might be structured to provide a product based on its unique product ID, supplementary indices could enable retrieval of all products under a particular category or brand. Such an approach enhances the versatility of entity caching, allowing for more varied and dynamic data access patterns without compromising on efficiency.

Aggregation Caching

Aggregation caching focuses on enhancing single entities by complementing them with joined and aggregated data from other sources. This is especially useful when an entity requires information from other microservices’ domains. When such an entity is created or accessed, aggregation processes pull additional data from various services, consolidating all the information into a comprehensive, aggregated document that is then cached. However, one challenge with aggregation caching is ensuring that changes in any of the data sources are accurately reflected in the cached aggregates. To effectively manage these dynamic changes, tools like ElasticSearch are commonly employed. ElasticSearch not only serves as an adept tool for aggregation caching due to its robust update mechanisms, but it also offers powerful search functionalities, enabling the filtering of aggregated documents based on specific property values. Within a microservices architecture, gateway or backend-for-frontend (BFF) services can directly access these aggregation caches from ElasticSearch, ensuring efficient data retrieval without burdening the original microservices.

Aggregation Caching with ElasticSearch

Aggregation caching, when employed with ElasticSearch, provides the advantage of scalability, efficiently handling vast amounts of data. Given ElasticSearch’s distributed architecture, it is adept at scaling out, making it an ideal tool for managing large-scale aggregation caches. This inherent scalability ensures that even with high traffic and voluminous data, cache retrieval remains swift and seamless. Furthermore, the flexibility of ElasticSearch allows for easy adjustments in accordance with growing data demands. Thus, when utilizing ElasticSearch for aggregation caching, concerns over data size and traffic spikes diminish, ensuring consistent and high-performing cache access regardless of data volume or request load.

Query Caching

Unlike Entity or Aggregation caching, which primarily centers around individual entities, Query Caching is designed to handle results stemming from specific database queries, often involving multiple entities and potentially spanning numerous tables. This caching approach harks back to the era of monolithic architectures where complex, multi-relational queries were prevalent. In a microservice environment, there may still be instances where a service manages intertwined, relational data, and caching the entirety of such a query’s result proves beneficial. This is especially true when the overhead of repeatedly fetching relational data is significant. Tools like Redis or in-application memory can be employed to store these query cache results, ensuring rapid retrieval upon subsequent client requests. However, it’s worth noting that a well-implemented aggregation cache in ElasticSearch can effectively render many traditional query caches redundant. This is due to ElasticSearch’s ability to quickly generate dynamic query results with its robust search capabilities, often sidestepping the need for a separate database query altogether. Utilizing query caching becomes especially effective when one considers the time-dependent and data change invalidation strategies that we will discuss in subsequent sections.

Caching Strategies

Architectural caching design is a strategic approach. Now, let’s explore various caching strategies and determine where each will be most effective.

Time-Dependent Invalidation

This method, commonly applied to query caches, establishes a predefined duration for each cache entry. After a cache item reaches its expiration time, it’s deemed invalid and removed. The next request for that data results in a cache miss, causing the system to fetch fresh data from the primary data source. This strategy ensures that the cached data doesn’t grow stale beyond a certain age, but it also means that data might be refreshed more frequently than necessary, even if it hasn’t changed.

Consider the immense volume of requests for a site like amazon.com, where the homepage might be accessed a staggering 1 million times in a single second. In this scenario, you could efficiently regenerate the homepage once every second and serve the same cached page to all 1 million requests within that timeframe. Of course, the content displayed on the page could potentially change during that one-second window, but in many contexts, this minute discrepancy is inconsequential. After all, this isn’t real-time data like that of a cryptocurrency exchange, and any changes will be reflected in the next update. By expanding this caching window, let’s say, to every 10 seconds, you effectively reduce the potential 10 million database queries down to just one.

Invalidation on Data Change

This strategy revolves around the idea that the cache should be invalidated or refreshed as soon as the underlying data it represents changes. This is a reactive approach where you monitor the primary data source for any modifications and act upon them. It ensures that the cache is always in sync with the primary data source, making it highly reliable. This method is particularly relevant when consistency between cached and stored data is of paramount importance. This strategy is also commonly used in query caching, especially when the cached data content is intricate. However, it’s essential to efficiently detect changes to prevent excessive or unnecessary cache invalidations, especially when only a subset of the data changes might be relevant to the cached data.

When dealing with frequent data modifications, especially in vast and dynamic datasets like products in an e-commerce platform, it’s crucial yet challenging to determine whether a specific change affects our cached query result. One could use deterministic algorithms, examining the exact nature and implications of the data change with respect to the cache contents; however, this can be time-consuming and computationally expensive. An alternative is to adopt a probabilistic approach. By using probabilistic data structures or algorithms, we can quickly estimate if a change is likely to impact the cache, at the cost of occasional false positives. While this method can significantly speed up the decision-making process, it might result in unnecessary cache invalidations due to its inherent inaccuracies. Thus, developers must strike a balance between decision accuracy and efficiency, considering the specific needs and tolerances of their application.

To grasp the nuances of this balance, consider the following scenario: Imagine executing a query on the products table, joining it with the category table using a complex where clause. In a purely probabilistic change management approach, one might invalidate the cache for any alterations to either the products or category tables. However, a more deterministic strategy would involve rerunning the same specific query, appending the ID of the modified record to the where clause to determine if that record falls within the query’s scope. But, the time cost of this deterministic method can be nearly as substantial as executing the original query anew. A best practice, then, is to tag the cache with specific field clusters derived from the where clause and subsequently compare these clusters with the changed data attributes. Take, for example, a query executed in the product service, filtered by a particular userId value. This userId and its associated filter value can act as a cluster marker for the cache. With this marker in place, one can discern that a new product associated with a different userId will not impact the current cache.

Partial Update Cache on Data Change

Instead of invalidating the entire cache when data changes, this strategy focuses on updating only the portions of the cache that have been affected. When a data modification is detected, only the relevant sections of the cache are refreshed, while the rest remains untouched. This can be more efficient than invalidating the entire cache, especially for large datasets. The challenge lies in effectively determining and managing the portions of the cache that need updating.

Partial updating is predominantly used in aggregation caching. While integrating it with a query cache may necessitate complex algorithms to pinpoint the modified segments of a query result, aggregation caching has explicitly defined entity properties. Hence, when any data in the database changes, all related properties in the aggregations can be updated via an update script. This approach is a recognized best practice in Elasticsearch. For instance, consider a situation where products have been aggregated based on their brand name and brandId, or category name and categoryId. If the name of a brand or category gets altered in the database, instead of meticulously selecting which caches to update, one can swiftly deploy an update script across all documents within an Elasticsearch index. A simple directive such as “update the ‘categoryName’ property wherever the ‘categoryId’ matches this specific id” can seamlessly ensure the data remains consistent.”

The Cache Chronicle: A Recap and Reflection

As we traverse the intricate landscape of caching in modern architectures, it becomes abundantly clear that the act of storing and retrieving data is both an art and a science. From the foundational discussions on data access costs to the nuances of entity, aggregation, and query caching, we’ve journeyed through the multifaceted dimensions of optimizing data retrieval. Each caching strategy, be it based in memory, disk storage, databases, Redis, or ElasticSearch, has its unique strengths tailored to specific scenarios and challenges. While the appeal of swift data access and reduced latency is universal, the path to achieving it requires a blend of technological understanding and architectural foresight.

In the age of microservices, the data landscape is ever-evolving. The crux lies not just in knowing the tools at our disposal but in discerning when and how to deploy them for maximum impact. As with any tale of technology, the cache chronicle is ongoing, but with the knowledge acquired, we’re better equipped to navigate its twists and turns.

Appendix: Dive into the Code

While discussing caching strategies is enlightening, showing the real-world implementation further solidifies understanding. Explore the code behind the strategies:

- Entity Caching: Explore the EntityCacher class in JavaScript.

- Aggregation Caching: Delve into the ElasticIndexer class in JavaScript.

- Query Caching: Understand the intricacies with the QueryCacher and QueryCacheInvalidator classes in JavaScript.

Gitlab Link: https://gitlab.com/malicaliskan/cachetools

--

--