What does “Cache is King” mean? And why do I think it’s important for a company that wants to scale?
“Cash is king” is an expression sometimes used in analyzing businesses or investment portfolios. It may refer to the importance of cash flow in the overall fiscal health of a business.
Yes, as you might have noticed Cache is King is just a word pun trying to highlight the importance of caching systems within any tech company and promote the usage of it.
In this article I’ll explain why I think using cache is so important and show the results of an analysis I’ve conducted with a real problem we solved at Yoox Net-a-Porter Group through caching.
What is the advantage of using Cache?
To make it simple, Cache is nothing more than a place where you store the result of an operation, to prevent your application processing requests over and over if the expected result will probably be the same.
We can imagine Cache being like a dam (cover picture), preventing all the requests to reach and flood our application, but just allowing the important one to go through.
As depicted in this image on the left, our applications probably wouldn’t survive a high number of requests, or we would end up spending a fortune on servers.
As you can see from the diagram above, in the first scenario when any client asks our application what the result is of an operation, we calculate the result and send it back. In the second scenario we instead calculate it in the first instance, then we store the result for that operation in a Cache service which will return that result for a period of time instead of calling our origin again.
Caching is such a vast topic because it can be applied almost to every scenario and there’s many valid techniques to accomplish this.
It all depends on the problem we’re trying to solve, but the advantages can always be summarised with the same keywords: performance, scalability, security and cost-saving.
Some Architecture and Patterns: caching strategies
TL;DR; if you already know about caching strategies, you can skip this and jump directly to the next topic of the agenda, the real case scenario in YNAP.
1. Embedded Cache (in app)
With this pattern a cache layer is introduced within the application so that each app will process a given internal operation once during a defined timeframe (TTL, time to live), as it will save the result in a local storage and check if the operation is already being processed before repeating it.
# Check the cache
record = cache.get(data)
if record is None:
# Run a DB query or call a service
record = service.request(...)
# Populate the cache
- Only one request to an external service or local operation per app node.
- Easy to implement and maintain.
- Fastest way to access a cache layer as it sits within the app.
- Not centralised, one request per application is not optimal.
- Not very scalable in a microservice environment.
- Gets deleted every time the application node restarts.
- Changes are tied to the deploy of the app.
- Only useful to store app internal context data.
2. Distributed Cache (in app)
Distributed Cache solves the problem of scalability and centralisation, as just one of the many applications needs to generate the response for a certain operation, and that response will then be used by every node of the same application. The advantage of this pattern is that every new app will already have a “warm” cache to work with.
- Centralised means that even with many requests sent per application node, only the first will be processed, no matter how many nodes — the benefit grows linearly to the number of applications.
- Every new node will benefit from a warm cache.
- Not part of the application — can be deployed and maintained separately.
- Applications will use less memory and resources.
- Every request still hits the applications which must then run a check on the cache layer.
- Not useful to store data that should only live in the context of one app node.
- Introduction of a single point of failure.
- Slower to access than the embedded approach
3. Proxy/Reverse-proxy Cache (outside app)
In this pattern the cache sits in front of the application nodes, so once a request to one of the application nodes is processed, there’s no need to send any other request through.
This centralisation drastically removes the need to have a high number of application nodes and it’s the best way to cache big requests which will consume a lot of resources.
- Centralised means that only one request will hit one of the application nodes, after that, for the whole TTL duration, no other nodes will have to process it.
- Programming-language agnostic.
- Great with highly scalable applications, like containers.
- Decoupled from the application.
- Can be shared between applications or even the whole company.
- Introduces a single point of failure, so redundancy and availability needs to be maintained here.
- Another piece of software to maintain.
- Complex to have a good cache-invalidation strategy.
4. Interesting Caching techniques worth mentioning
- Graceful invalidation
Graceful cache invalidation is a smart way of invalidating the cache that can perfectly fit the Reverse Proxy caching pattern.
It consists of saving the cache value, then when the TTL has expired for a subsequent period of time, the Cache layer on a new request will return the stale content and then fetch the value from the origin asynchronously. In this way clients will never experience any “origin” request processing time, and therefore, slow requests again.
This could even be used to shield an application from too many requests or the need for complex behaviour in response to an unexpected origin response.
As is pictured in the sequence diagram below, when the value is in the TTL timeframe, the request will directly get a response from the cache layer. During the Grace period, it will still get a response from the cache layer with the stored value at the time, but it will also generate a request to the origin which will then replace the value stored in the cache layer and reset TTL + Grace.
- Predictive cache
This is the kind of pattern worth mentioning to show what you can achieve when the caching layer allows you to develop smart caching algorithms.
If we take a product detail page of an e-commerce site for example, usually we don’t really want to cache that page as the stock will change on every order, so we need it to be as fresh as possible and that usually causes far too many requests to our indexes and DBs. However, if we build a service that returns us the stock level of a certain product and stock changes on average every 5minutes, we can easily calculate:
# Check the stock of a product and our stock reduction service
stockAvailable = product.stock()
stockUnitsConsumption = stock_reduction(product) # If the last 5minutes average reduction, times 2 (to be extra
# safe) won't theoretically consume the whole stock, we can
# cache the product and avoid 1 request each new customer.
if ( stockAvailable - (2 * stockUnitsConsumption) ) is <= 0:
return cacheable = True
return cacheable = False
Of course this is just an example, as we can imagine in a product page there’s considerably more than just the stock, but if our cache is also well fragmented to have the stock as a single entity of a product and we use an aggregation layer to combine all the cache blocks, then this approach allows developers to create a lot of interesting and creative approaches to increase performance.
Improving YNAP CMS cache
Now I’m going to describe how we used one very popular and open-source product (Varnish) to drastically improve our CMS API performance up to 210 times (and yes, it’s not a clickbait, we seriously achieved that).
First, let’s briefly describe how our CMS works and why a new cache layer was needed, I’ll use TheOutnet.com as an example.
Other than the wonderful homepage, our CMS also serves the Menu of the whole website which itself doesn’t seem to be something complicated. However it becomes complicated when our business wants to provide a great customer experience, but asks tech to check, for every single category in the menu, if those contain products or not, and further, if those products have stock or not, to avoid a customer ending up browsing an empty category — which will inevitably lead to a very unhappy customer.
The problem starts in that Menu response when we have to check stock levels for the all 240+ categories that are part of a live menu.
As you can imagine it’s quite an intensive operation, as it needs to run for every permutation of country, language and catalog.
This wasn’t the only complexity of course, we also had to keep in mind that within our architecture, we used scalable applications with 6+ nodes (depending on the traffic) to deliver that content, so basically we were caching that information at least 6+ times, on every single delivery unit.
This was how our first approach looked like:
We solved the first problem, that had nothing to do with caching, but related to making 240+ requests to our commerce platform, by creating a tailormade API on the commerce side to return just the aggregated category list and stock with just one call, but now that call was quite a big operation and the payload for a single request was also very big.
This led to an average Menu call taking 835ms, and even worse on the 95th percentile we had an average of 4547ms. So this solution wasn’t really production-ready even though we reduced the number of requests to the commerce platform by 260 times.
As this was the best achievable result from the application layer, we decided that we had to change our CMS architecture to support it, guaranteeing the same accuracy and freshness of data, but also making it invisible to any client, so we worked on our architecture and changed it.
First we redirected every request from clients to the CMS to go through a Varnish (open-source) cluster sitting in our internal infrastructure, then we also decided that every request that the CMS was sending to the commerce had to go through the same Varnish layer.
Then we just applied some of the patterns described before in this article:
- Using Varnish as a Distributed Cache we avoided every delivery node storing its own result in isolation, reducing both the memory required and the number of redundant requests made to our Commerce by 6 times (number of nodes).
- Using Varnish as Reverse Proxy Cache, we reduced the amount of requests that needed to hit our application to 1, so no longer having each Application node processing the same request.
- Enabling the Graceful Invalidation available on Varnish. After the very first request post deploy, our clients didn’t have to wait anymore for our app to collect results from Commerce and aggregate them, as Varnish was doing it in the background for them and always providing the latest available value — so no painful origin hits any more.
- In addition we also used Varnish to prevent our Head applications flooding the CMS with requests returning unexpected 404 errors in two ways.
First we cached 404s for 2 min, so that just one request every 2 minutes was forwarded by Varnish to our apps.
Second we checked if during the grace period, a given request that had a 200 Response Code previously, now suddenly had a 404 response code — in that scenario for the duration of the grace period we will still return the 200 Stale Content, while we had Varnish check in the background if that was a temporary symptom of a problem or an expected 404.
If after the grace period we still had a 404, then we return it, otherwise if it was unexpected, we would’ve been able to fix the issue during the grace period.
It’s worth mentioning that with unexpected 404 could be problems encountered when managing large SOLR/Elastic Search indexes, during backup/restore operations or during deploys, where it’s always good to have a safety net just in case.
First, here’s how centralising the cache with the distributed cache pattern reduced the requests sent from the CMS to our Commerce, this is a snapshot view on Kibana.
The calls are reduced by 6 times as 6 were the nodes of the application, the new amount of calls is the one we need no matter the number of nodes and we need far less nodes.
Second, we can see below the benefit of having a Reverse Proxy caching pattern implemented which resulted in a much higher cache Hit ratio, than the calls sent to the origin.
Third, here’s a picture that demonstrates how huge the response time difference is for a client when applying a Graceful Invalidation strategy on our Varnish layer.
The first 3 green spikes are due to 3 single performance stress tests on our CMS. Then enclosed in the red border, there are 3 more tests, the first 2 almost invisible as no matter the response time, the client will always see a fresh cached version of the content. The third spike in the highlighted area was a stress test + restart of the whole cache layer, which resulted in all the cached keys being deleted,this was carried out to test how it would behave within a failure scenario.
In summary, testing from 800 to 1000 menu requests per second, this is what we achieved.
With old caching strategy — 835ms average
With new caching strategy — 21ms average
we improved by 39,7 times, avg response time is reduced avg by 97,5%
95th Percentile on response time (slowest requests):
With old caching strategy — 4547ms average
With new caching strategy — 21ms average
we improved by 216,5 times, avg response time is reduced avg by 99,53%
In addition to the extremely positive results experienced by the head applications, we were also able to significantly reduce the number of application nodes, server sizes of the remaining nodes and massively reduce the amount of calls sent to our Commerce (another saving in nodes and server size), which all translated in a lot of cash (the important kind) saving.
For everyone who wants to improve performance of their applications, increase stability of their architecture or simply reduce cost of the infrastructure and footprint on other services, remember, Cache is King … there will always be a caching pattern out there to help you out.