Published in


Caching — How to do in a proper way

Caching is easy, no kidding. Anyone can do it after reading a 10min tutorial. Same way as a 3yo kid can use pen to draw something (by something, I literally mean nothing). Knowing how to draw is not the same with drawing something and definitely not drawing something cool. So is caching.

In previous post, I explain some basic concepts related to caching and different techniques used at different levels. In this post, I try to help you to find by yourself the answers for 3 main questions via a specific Web application.

  • Do we really need to cache this data?
  • If so, where?
  • And for how long?

First of all

Let’s take a look at the following PHP:

This is the most common strategy for caching — cache aside. It is common because it can be used in different applications, especially those issuing more reading requests than writing requests.

The flow of cache aside strategy is visualized as followings:

In fact, you can find that code in quite a few libraries, frameworks and can easily clone it to your own code due to its simplicity. That is what I did when I was a fresher — put this code wherever there is a query to database in order to improve system performance. The thing is, when your application is simple and the number of users is small, caching anything at anywhere at anytime seem to be efficient and effective. I used to think that caching would automatically make the system scalable, (or perfect).

Time goes by, I realize that the any-cache strategy does not really contribute much to system performance. Apart from common issues like full disk, out of memory (due to caching many things), it actually requires various techniques to reduce an overhead rather than caching. That is when I took a step back and think about when to cache, what to cache, where and how to do that.

There are not many documents about this issue. Is it because caching is too simple to be a topic to discuss or to have a book about? Of course you may find some books published by O’Reilly about caching technique, i.e. caching HTTP request via HTTP header. By “technique”, I mean there is not any discussion about the methodology to the best of my knowledge.

Do we really need to cache this data?

This is important when the application gets more complicated due to the presence of various data types. Should we cache all HTML pages, or cache data after doing calculation, or data retrieved from database, or the shared configuration data?

As I mentioned from the previous post, we need to consider the following to decide if the cache is needed:

  • Time or resource consumption: that is, time, CPU usage, disk IOPS, network bandwidth, file descriptor, etc. required to perform a task. Using caching, it is expected to reduce time or resource if the task needs to be re-executed.
  • The result can be reused multiple times: that is, the cached output should be re-used. If it is not that frequent, i.e. 0~50% hit ratio, then caching the output is not efficient or useless.

Additional thing to consider is

  • Acceptance level for data discrepancy: Are you OK if your data is inconsistent even though the 2 aforementioned points are satisfied?

Top-down Analysis

What is shown in the following picture is the details /insights of a request. The upper layer is actually the result/output of the lower layer. The deeper we go, the lower latency/resource it requires and the more frequent the data is re-used.

The top-down view of the above figure illustrates how deep we dig into a request in the sense that the content of the top layer typically include that of the bottom layer. And the deeper you go down, the less/lower resource/latency it requires and the more reusable the data is.Therefore, our strategy is:

  • Better to cache top layers of the system: To reduce the request’s processing time and therefore obtain the result.
  • Try to “duplicate” some properties from the bottom layers at the top ones: for example, to bring static data or shared data to the top.

For a blog website, it would be:

The deeper you go, the easier data can be cached and of course the advantage of caching in terms of the processing time for a request also reduces. Depending on the requirement from the traffic as well as the acceptable threshold of data error/integrity, we have particular caching strategy:

  • If your blog site has just a couple of visits, then caching at layer site config or content data is enough
  • Otherwise, we might consider caching the entire HTML or body page given the acknowledge of the discrepancy in dynamic data like liked data.

Note that going in top-down direction helps to choose specific layers to cache. Caching multiple layers is no harm but will lead to data duplication and maybe unnecessary as the data at top layers already include that at bottom layers.

Left-right Analysis

Let’s take a look at the request from left-right view based on generic/specific characteristics, i.e. shared/private data, that is:

In this way, the data to be cached for the aforementioned blog site is separated into different parts, that is,:

  • Header, footer (config data for the entire page)
  • Today’s top-viewed articles (sidebar)
  • Main content (body)
  • Current user information (shown at top-right corner)
  • Information of users related to current articles (read, like, comment, etc.)

These data components are typically independent and can be separately cached based on data characteristics. Adopting left-right analysis approach enables you to determine which component requires resource the most, highest latency, which component is reusable the most. This help to determine which component needs to be cached.

  • If it takes 100ms to load main body which is reused at multiple places => should cache
  • If it takes 120ms to load user information but not often required, e.g. most of users are guest => not yet cached

If so, where should we cache data?

In this post, I focus on locations available for caching data of back-end applications. Two common approaches are i) caching on server with running code and ii) caching remotely via network.

On server code

There are 2 methods to cache data — memory and file.

Memory (in-process): caching data as variables of the application process, e.g. array, hash map.

  • Saving as-is without serialize
  • Fast
  • For applications written in NodeJS, Python, Java, .Net, with main process and shared memory between requests
  • For data with low latency, high access frequency, small in size. The latency of this type of data is usually less than 1ms.

Examples are configuration information from database or HTTP endpoint, common data of all requests, or information extracted from request, i.e. user agent, token.

Local file saving data into a file on server as string/text or binary.

  • Serialize
  • Depending on local disk speed, but usually fast
  • For applications written in PHP, without main process or non-shared memory.
  • Application running on VM, bare metal
  • Early stage of the development (start caching) or later with particular purpose.
  • Large data or native like HTML, CSS, JS, Image, Video.

Examples are static asset cache, session data, full HTML page, template view.

Caching data on server where your code is deployed could be efficient as it is fast. However, it should be done for applications running as a few instances or independent instances. The scalability is bound to a node’s memory and the caching performance in terms of hit/missing ratio could be degraded in distributed (multi-node) systems.

Remotely caching

Remote memory storage: These are 3rd-party solutions which have data directly cached on memory, i.e. Redis, Memcached, etc.

  • Available with basically predefined data types
  • Depending on network condition
  • Suitable with most of applications
  • Easy to scale without (or at least not significantly) affecting the performance, independent on the number of code server. Latency requirement is often between 1–10ms.
  • High hitting ratio as it shares cache with multiple code server
  • For small data items due to RAM limit (and network bottleneck for large data)
  • Data with low Time-to-live value.

This is actually the most widely adopted cache solution as it fulfills many requirements, acceptable speed and scalability. But don’t forget one thing: even though it uses memory to store data, it suffers network latency.

Which type of data should we cache on remote memory storage? Well, basically you can cache any type of data except HTML page or large data. For real-time data, I would suggest to use in-memory.

Remote disk storage: This approach chooses to cache data either on disk or on both memory and disk. Examples are key-value databases like Aerospike, Hazencast.

  • Available with more features and more supported data types
  • Faster than normal database as they are usually optimized for SSD storage.
  • Easily scale in terms of resource or the number of nodes while maintaining performance at an acceptable level.
  • For large data or personalized data (recommendation, advertising, tracking, analytic)
  • For applications with requirements of high availability, resiliency, auto failover, etc.
  • Data with high TTL value.

Take away

In general, I think whatever we go for can be categorized into above 4 storage methods. If your application is small, selecting storage for caching may be not critical, pick whatever in your hand, that is

  • For interpreting languages, non-sharing memory like PHP, I prefer local file (if running on a single instance) and network memory storage like Redis/Memcached (multiple instances)
  • For languages like NodeJS, Python, Golang, we can try in memory and network memory storage. For small and common (highly requested) data, I would use in memory and Redis/Memcached otherwhise.
  • For Java, .NET, I would go with in memory to take advantage of the capability of handling complex data types as well as the powerful server ability.

However, when your application has to handle a great deal of traffic, it needs to be careful in selecting where to cache. Some tips I can share are:

  • From data access point of view: hot-cold goes with local cache and even-distributed is with remote cache.

Hot means data that is accessed many times and cold means opposite (of course). Let’s consider the case in which we use 1 key for configuration (hot) and 1 key for user data (cold). If we use distributed cache like Redis/Memcached cluster, sharding data between nodes results in the unbalanced cache over nodes. Node with hot item would be accessed more frequently and node with cold data is not. So it’s better to do cache locally in memory and have it duplicated across processes.

Even-distributed is for items with equally accessing frequency, i.e. 1 key for each user. If we caches at local, hit/miss ratio will be reduced as data needs to be duplicated over servers whereas distributed remote cache like Redis/Memcached cluster is more efficient thanks to their sharding and load balancing capability.

  • Amount of data: Local if data is small in size and can be determined in advance, Remote otherwise.
  • Performance: local is faster than remote, obviously :)

And for how long?

This is the most challenging question. In some cases, it is more important to optimize the TTL value than deciding to cache in memory or at Redis. Because the duration of caching data has a direct impact on data discrepancy and the performance.

Let’s look at the 2 following plots about the relationship between the increase of cache TTL value and cache performance in terms of hit ratio and storage cost.

  • Increasing cache TTL will require more storage size
  • Increasing cache TTL will increase hit ratio in a non-linear manner. It means that at some point, increasing cache TTL does not help to improve cache hit ratio as expected.

Another concern is that data integrity, which I don’t know how to illustrate via plot, will be more vulnerable with high cache TTL value.

In practice, there is no way to achieve an optimal value for cache TTL without putting some effort to tune your system. Some tips are:

  • Set cache TTL to have hit ratio as high as possible, i.e. at least 85% and best if > 90%.
  • Increasing cache TTL will cost storage resource. Is it worth paying 3 times of memory size to increase hit ratio from 88% to 90%? For me, nah.
  • Setting cache TTL should be done with consideration to the freshness of your data. Ask yourself how sensitive of data discrepancy you would feel comfortable.
  • Sometimes cache is used to throttle query in your database. In many cases, 5–10s of caching could be more than enough to avoid peak query loads in database for hot data.

Normally, I set cache TTL for backend application not longer than 1 day. 1–30 minutes is enough (or even a lot). In case the cache needs to be there for more than 6h, I have to consider implementing mechanisms to invalidate cache.

Note: If you can control cache invalidation, you are safe to increase cache TTL.

To sum up for the “how long” question: there is no one-size-fit-all solution, please don’t set it ad-hoc (and leave it there), keep monitoring cache hit/miss ratio as well as storage size and do the tuning.


So far I try to give you answers to main questions for using cache: what, where and how long. In next post, I will walk you through some issues that might occur when using cache and strategies to resolve (or at least mitigate) the consequence.


I would like to send my big thanks to Quang Minh (a.k.a Minh Monmen) for the permission to translate his original post.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tuan Nguyen

Tuan Nguyen

PhD student, Senior Software Developer