Published in


Caching — Basic concept for next-level developers

Caching plays an essential role in the success of large- (and small-) scale systems with millions of users. This is a technique or a skill or whatever you want to call, that every developer from backend to frontend, from the web to mobile, or from the new universes of ML/AI, blockchain, etc. should be aware of. Up to some extent, for any system, there are always certain components doing caching. That could appear in a tiny thing like the L1 cache memory of a CPU or in a huge system like CDN that is widely used in the deployment of websites with high traffic demand.

For those who desire to take the next level for their background, doing a Google search with millions of results would help nothing but make them give up even faster. Where you are, what is your level, which type of caching should be studied, I hope that you can find the answers to these questions in this series.

Level 0 — Basic concepts

Whatever it is, these are the basic principles among them:

  • Caching is to save/archive/store (you name it) the result of a time-consuming action so as to have it back quicker (to reuse). IMHO, this is the best explanation for caching. If you don’t understand this principle, it would be easy to be overwhelmed and even misunderstood caching as Redis (???), RAM (???). Just keep in your mind the bolded phrases, then you would be fine.
  • Typically represented in the format of key-value (because it takes only O(1) to find data in a key-value data structure). I bet you know of hashmap in your code, key-value database (Redis, Memcache), or file (with name as key).
  • Basic operations with cache are to get (by key), to set (by key), and to delete (by key)
  • A value is usually stored in cache with a parameter called TTL — Time to live. This parameter indicates the expiration period of time to remove the value out of cache so as to freshen cache as well as release cache size.
  • A common issue with cache is stale data, i.e. data is not identical to original data.

We need to consider the following to decide if the cache is needed:

  • Time or resource consumption
  • The result can be reused multiple times

Keep in your mind those 2 concerns before making any decision. For example,

  • It takes 500ms to obtain data from the database but for only 1 request from a web page that nobody is interested => not yet to cache
  • It takes 100ms to build a CSS code but all users need to load => yup, you need it.

Bigger picture

Typical caching-enabled system

As you can see, there are 4 main layers for a common application: client-side, proxy, application, and database. Each layer has a corresponding caching system and we can have different levels to monitor or to control. Note that there is a caching layer at the OS/kernel level which is out of our (app developers) scope.

Client-side: this layer is actually from the end-user devices, i.e mobile app, browser. Here, caching has only one goal — for the quality of user experience. Therefore, the questions of what should be cached and for how long would be to improve user experience, not the system.

At this layer, we can control the cache up to some extent by

  • Set HTTP header to control caching of requests (non-deterministic)
  • Configuration of cache HTTP request of the library
  • Implement cache with local storage

Note that any calls to the cache could be time/resource-consuming, and challenging to troubleshoot.

Proxy: These are considered intermediate layers before requests from end-users arrive at the application. They can be CDN, Reverse proxy, Gateway, etc. Caching at this layer mainly is mainly for static content from a large number of users.

At this layer, we can manage the cache via

  • Set HTTP header to control caching of requests
  • Configure cache HTTP requests on intermediate servers.

While data can be obtained automatically from the cache, it can take time and is 3rd-party dependent as the issues occur.

Application: Yup, caching at application. It is for dynamic content from a large number of users.

We have full control over caching by

  • Configure existing addons or plugins
  • Implement even from scratch

By full control, I mean we can decide what to cache, where, and therefore have the knowledge to modify, customize as well as troubleshooting. Caching at this layer is transparent to end-users and less dependent on 3rd parties.

Database: This is for retrieving data.

As database users, the database is actually a black box and is optimized based on recommended parameters from providers. What you can do include:

  • Configure query cache
  • Configure memory usage

OS/Kernel: This layer resides everywhere in the entire system and is used for data processing operations.

Normally, as developers, we have no control over how OS uses its memory to cache things. Optimizing OS/Kernel caching is related to micro-optimization and in most cases, not really a concern to us.

Level 1: Monolith webpage with a single server

Probably, you might start your journey like me, with old-school applications: monolithic, running on a single server, rendering HTML data, and so on. Let’s recap some requirements at this level:

  • Mainly news, posts, CRUD-enabled applications that could be self-implemented or deployed by using frameworks like WordPress, Joomla.
  • HTML and data are rendered from web servers as opposed to using novel JS frameworks like React or Vue.
  • HTML template, static JS, CSS stored in DB or using view engine.
  • Typically existing systems, therefore it is important to reuse existing things, i.e. code, and do less system modification.
  • Focus on a single user’s experience instead of scaling the system.
  • Small amount of data

Here are the things that slow down the application

  • One HTML page can be rendered after running 10–50 queries to the database with a latency of 2–3s.
  • It takes time to load media data like images, videos and as a result, “blanking” the page for about 2–10s.

I highlight items that can be optimized in the following figure.

In details,

1. Enabling query cache in DB

RDBMS databases like MySQL, PostgreSQL provide a built-in mechanism to cache queries’ results and can be enabled as a configuration. You can google for each type of database’s cache with, say <db-name> cache.

In the applications at this level, queries are highly reused not only between requests from different users but also between different sections of a single page requested by a single user. This can be managed using the query cache mechanism of databases.

Another thing to consider database cache is your data size. If it is small and can be entirely resided in the database’s RAM, then query cache is likely enough.

For example, to enable MySQL cache:

MariaDB [(none)]> SET GLOBAL query_cache_type = 1; Query OK, 0 rows affected (0.00 sec)

2. Enabling cache CSS, js, HTML template by file

Using file as a cache for parts with data rendered from the database.


  • No dependency and can be done with any application.
  • Fully supported by frameworks like WordPress or Xenforo to accelerate page load. What you need is to just enable it and that can be done easily.

3. Cache full HTML page or partially by file

WordPress, Xenforo all have plugins to do this. Frameworks using the view engine also support cache results in HTML.

Common misunderstanding is to under-estimate the importanace of using file to cache and think disk IO is slow (say, comparing to Redis). Remember, if Redis is running on its own server, then that would be between disk IO local file and network communication to server redis (not disk IO vs memory).

For example, here is how caching full page in Laravel is done:

4. Serve media using CDN

You upload media files like videos, images onto CDN (Content Delivery Network). CDN is basically a caching system according to geography locations, enabling users either in the US or in Vietnam to access those media files with low latency. These factors significantly affect to user’s experience with the application and therefore need to consider separately from the beginning.

In order to use CDN for media content, you need to:

  • Upload media files onto a separate system or use different routing to different domains. One option is to deploy them on AWS S3.
  • Configure CDN, i.e. Cloudflare, to serve media files

There are many S3-like storage providers for you to choose from, like AWS, GCP, Digital Ocean.

Level 2: Monolith webpage with multiple servers

At this level, you might see the system with multiple servers and more traffic load with the same type of applications. Now caching also needs to improve the responsive ability of the system while it is ok to reduce the page load time just a bit lower than that at level 1.

Some common characteristics among these applications include:

  • Running on multiple servers, therefore data discrepancy caused by using cache needs to be considered.
  • High traffic load requires focusing on caching the whole page to reduce load at application and database layers.
  • Large amount of data for a not well-configured database
  • Presence of Proxy or Load balancer.

In details,

1. Cache Data Object và HTML Template using 3rd storage

Query cache of the database has a limitation of locking and can reduce the performance of the entire database system. Thus we move out of the database and do the caching for data at the application layer. And because the applications run over multiple servers, we place more value on data integrity via using a shared caching system. Particularly, we use Redis/Memcache on a separate server instead of cache files running on every server.

We focus on caching

  • HTML template (for the template-based system such as Laravel blade)
  • Data object, i.e. query result, HTTP result

Which objects should I cache if there are many such objects in my application? Think about 2 aforementioned concerns to decide: i) time or resource consumption and ii) result can be reused multiple times.


2. Cache opcode using memory

This is solely applicable to interpreted language like PHP. Enabling cache operation code can considerably improve the performance of the applications that have a large codebase as well as can reduce time to send a request.

You can google this option with the keyword `opcache`.

3. Cache static asset using CDN

We use CDN for static content like CSS, JS, font, etc. In order to do this, we need to:

  • Separate CSS, JS from dynamic content into static files
  • Implement the mechanism of invalidating CSS, JS via querystring, file hash, etc. if needed.
  • Configure CDN to serve static files.

3. Cache the entire page at Reverse Proxy

We use reverse proxies like Nginx, HAproxy to cache entire HTML pages. What we did are

  • Determine which part is static to all users, which part is dynamic with each user. The dynamic one can be called using AJAX and the static one is fully cached.
  • Configure Reverse Proxy to cache HTML response.

This solution stops requests from end-users right at the Proxy layer and reduces load at the applications. It is an important and efficient method with posts-based systems because it can both improve user experience and enable the response to a large amount of traffic.

Example of configuring cache with Nginx


I stop my writing here so that you can find some time to digest it. I will get back to you with another 2 to 3 levels of using caching when implementing a large-scale system.


I would like to send my big thanks to Quang Minh (a.k.a Minh Monmen) for the permission to translate his original post.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tuan Nguyen

Tuan Nguyen

PhD student, Senior Software Developer