Cache me if you can: A Look at Common Caching Strategies, and how CQRS can Replace the Need in the First Place
Caching data is a useful pattern for any application that needs to serve high traffic and finds itself with latency requirements incompatible with the selected persistence choice.
While simple at first, creating and maintaining a cache of your data has aspects often overlooked by those leveraging this pattern. In this article I will cover some of the challenges with caching, typical solutions used, and the notion of using Command Query Responsibility Segregation (CQRS) as a better strategy.
It’s (Almost) all About Latency
Low latency requests are a standard non-functional requirement, especially for e-commerce applications, as there is an established understanding that the business loses potential sales for every X milliseconds your application takes to serve data to your customers.
The path to addressing high latency, specifically for the retrieval of information, will most likely come in the form of adding a cache — a copy of your data — that can be retrieved significantly faster than if we attempted to do the same from the origin. Figure 1 illustrates this process.
Caching is widely used, from inside the microprocessor, to the network infrastructure used to serve the web pages you see and the applications that we run. While caching sometimes has additional advantages and objectives, such as providing reliability and redundancy, in this article I will focus primarily on the latency aspect.
Before we look at it further, let’s discuss if we really need a cache.
When is Caching Beneficial?
It may sound odd to bring up in an article about caching, but before deciding to add a cache make sure to actually consider if your application needs one. Customer-facing applications are usually more sensitive than server-to-server needs. A given result in a customer-facing application has more impact than in a server to server call that happens without user interaction.
Traditionally, the need for caching stems from your response being the result of a query that executes multiple joins behind the scenes. These joins are expensive, leveraging resources that can’t be used for other requests, taking time before delivering the response back to the client, and effectively limiting how many requests you can serve at the same time. Having a quickly accessible pre-joined result becomes enticing or even essential in conditions where your system is under high load.
With the development of NoSQL solutions such as CosmosDB and DynamoDB, it is possible to obtain low latency responses at scale, which may negate the necessity of adding more complexity to create and maintain a caching infrastructure. How those technologies achieve low latency is beyond the scope of this article, but suffice it to say it does not come without trade-offs, so make sure to read the fine print before making the switch.
Prior to implementing a cache in the development of your application, I recommend assessing the non-functional requirements and accessing the patterns you will be using. Your analysis may reveal that you do not need a cache at all. For example, if your application is not sending information back to someone who is browsing your e-commerce website, it may not matter if it takes 10 or 100 milliseconds to process your request.
If you decide to have a cache, and the expected end result is the same, there are different strategies to create a cache, each with their pros and cons. Let’s dive deeper into the mainstream ones.
Common Caching Strategies
The first and simplest strategy is known as read-through. In this approach, your application will first attempt to read from your cache. If the requested information is not found it will fetch from the original source, add that information to the cache, and then return to the client.
A pseudo-code of a read-through cache implementation can be seen below:
In best case scenario, the information resides in the cache and its contents are returned as is. Worst case, the information is not present in the cache and the origin must be accessed to retrieve the contents and add them to the cache prior to returning to the client.
Since there is a built-in fallback mechanism, it is not necessary for all data to fit in the cache and you may have an eviction strategy to remove entries that are too old and possibly outdated, or are not as frequently accessed and can be removed in order to save space.
A second strategy, write-through, populates the cache automatically as part of each write to the origin.
The read process only reads from the cache.
A pseudo-code of a write-through cache implementation can be seen below:
This strategy will keep the cache in sync with the origin, and also has the upside of eliminating stale data (a problem explained in the next section), but comes with two additional costs: your cache needs to fit the entire data set, and your writes will be slower and more complex as they need to write to both persistence solutions and potential failures of one of them ie. writing to the origin succeeds but writing to the cache fails.
The final strategy I will review is write-behind, which flips the source of truth (SOT) to the cache.
A pseudo-code of the implementation can be seen below:
Similarly to the write-through, the cache must be able to hold the entire data set, but the SOT is temporarily the cache and eventually makes its way toward the origin. Because the information is first written to the cache, it is always up-to-date and retrieval will always return the information with low(er) latency.
Unfortunately, there are two complexities with this solution: the cache must be resilient to make sure it does not lose any information prior to it making its way to the origin, and there is an additional sync process to be developed and maintained.
Why Adding a Cache is a Non-Trivial Task
On the surface adding a cache is simple, take the read-through strategy as an example. You already have the ability to retrieve the data from the origin, then wrap that with a call that saves this data in the cache.
In reality, you will encounter several details that make even the most straightforward approach error-prone. Let’s look at the often overlooked pitfalls.
For an application to use caching, it must already accept that it will serve potentially outdated information. But how do we determine when the freshness of the data is good enough?
The simplest approach is to establish some time, also known as Time to Live (TTL), wherein after this elapses the data found in a cache is to be ignored. That’s great but what value should you put it at? 5 seconds? 5 minutes?
If you put a value too big, the information in the origin may have changed too much and you’ll be sending back a substantial amount of outdated information to clients. If you put it too small you may negate the benefits of the cache altogether, as the frequency of the requests for a particular entry is lower than the TTL. In the end, your application context should dictate the TTL you select.
As we saw when we looked at different cache strategies, we encounter situations where the data we are trying to receive is not yet on a cache or it has been deemed unfit for use. In those cases, a request to the origin of the data is required.
Nothing out of the ordinary, but imagine if the data you do not have in the cache is popular, which could be the case for a “hot” product just released and in high demand. You could have hundreds of requests for the entry arriving almost at the same time, all of them not finding the entry in the cache and triggering an expensive request to the origin.
Depending on the volume of these requests, they can cause overhead on the origin while trying to serve the same operation over and over again.
There are established ways to deal with a cache stampede, from locking the access so only one request will actually trigger the access to the origin, to preemptively making sure an item is found in the cache prior to enabling any traffic to it.
All of these methods add complexity to the solution, which is rarely factored into the estimation or cost of maintenance.
Why CQRS can be your Cache
Command Query Responsibility Segregation is a pattern that has been around for a while. In a nutshell, it acknowledges that the needs for interaction differ between those that request some information and those that mutate the state of a given system.
A query is an action that does not mutate the state while a command does. Additionally, although not mandatory, CQRS implementations leverage different persistence solutions — or uses — between the query and command sides.
One reason is that the writing side is where the complexity lies to ensure the business rules are respected. The read side, on the other hand, may just require a (subset) of the entity and no logic to guard changes.
The write side will emit messages — events — representing the state changes that resulted from a command. You will use those messages to create and maintain the read side.
Because the read side has no obligation to match the write side on the technology to use, you can select one that addresses the latency requirements. Figure 6 illustrates one example where it may not be possible to satisfy all access patterns using the same persistence technology. In this case, you can use the same events to build different models.
So let’s recap the properties commonly found in a cache:
- Cached data is eventually consistent with the origin
- Provides a pre-computed result
- Satisfies a read-only pattern with low(er) latency requirements
With CQRS we end up with a solution that matches all the above, with a couple of advantages:
- Contrary to most cache implementations, CQRS is not an afterthought but rather planned from the beginning
- The read side can be simple as there is no need for repositories or manipulating entities. Simple data transfer objects (DTOs) can be used to represent the data.
Caching is a ubiquitous pattern used in application development. The typical cycle is to develop your application, deploy it, and after some time find out that you have to add a cache because your application can’t cope with demand.
On the surface, it may look like a trivial task, but managing TTLs and handling stampedes are two of the most overlooked complexities that you should factor in when deciding to add a cache.
Alternatively, if you are already developing an event-drive application (EDA), a potential solution instead of adding a cache is to leverage a CQRS pattern as it will make use of the already existing event approach and infrastructure to deliver the desired outcome from the get-go.
Finally, remember to challenge if your application really benefits from a specific low latency operation. Faster is often better but the cost to achieve it may be steep. No matter what solution you choose, make sure to distinguish between the desire to have caching and a truly advantageous use case where the effort required to implement caching has significant improvements on your application.
Editorial reviews by Catherine Heim & Nicole Tempas.
Want to work with us? Click here to see all open positions at SSENSE!