Caching — System Design Concept For Beginners

12 min readApr 2, 2023

Facebook, Instagram, Amazon, and Flipkart….these applications are the favorite applications for a lot of people and most probably these are the most frequently visited websites on your list.

Have you ever noticed that these websites take less time to load than a brand-new websites? And have you noticed ever that on a slow internet connection when you browse a website, texts are loaded before any high-quality image?

Why does this happen? The answer is Caching.

If you check your Instagram page on a slow internet connection you will notice that the images keep loading but the text is displayed. For any kind of business, these things matter a lot. A better customer/user experience is the most important thing and you may lose a lot of customers due to the poor user experience with your website. A user immediately switches to another website if they find that the current website is taking more time to load or display the results. You can take the example of watching your favorite series on any video streaming application. How would you feel if the video keeps buffering all the time? Chances are higher that you won’t stick to that service and you discontinue the subscription.

All the above problems can be solved by improving retention and engagement on your website and by delivering the best user experience. And one of the best solutions is Caching.

Caching — An Introduction

Let’s say you prepare dinner every day and you need some ingredients for food preparation. Whenever you prepare the food, will you go to your nearest shop to buy these ingredients? Absolutely no. That’s a time-consuming process and every time instead of visiting the nearest shop, you would like to buy the ingredients once and you will store that in your refrigerator. That will save a lot of time. This is caching and your refrigerator works like a cache/local store/temporary store. The cooking time gets reduced if the food items are already available in your refrigerator.

The same things happen in the system. In a system accessing data from primary memory (RAM) is faster than accessing data from secondary memory (disk). Caching acts as the local store for the data and retrieving the data from this local or temporary storage is easier and faster than retrieving it from the database. Consider it as a short-term memory that has limited space but is faster and contains the most recently accessed items. So If you need to rely on a certain piece of data often then cache the data and retrieve it faster from the memory rather than the disk.

Note: You know the benefits of the cache but that doesn’t mean you store all the information in your cache memory for faster access. You can’t do that for multiple reasons. One of the reasons is the hardware of the cache which is much more expensive than a normal database. Also, the search time will increase if you store tons of data in your cache. So in short a cache needs to have the most relevant information according to the request which is going to come in the future.

Where Cache Can be Added?

Caching is used in almost every layer of computing. In hardware, for example, you have various layers of cache memory. You have layer 1 cache memory which is the CPU cache memory, then you have layer 2 cache memory and finally, you would have the regular RAM (random access memory). You also have to cache in the operating systems such as caching various kernel extensions or application files. You also have caching in a web browser to decrease the load time of the website. So caching can be used in almost every layer: hardware, OS, Web browsers, and web applications, but are often found nearest to the front end.

How Does Cache Work?

Typically, web application stores data in a database. When a client requests some data, it is fetched from the database and then it is returned to the user. Reading data from the database needs network calls and I/O operation which is a time-consuming process. Cache reduces the network call to the database and speeds up the performance of the system. Take the example of Twitter: when a tweet becomes viral, a huge number of clients request the same tweet. Twitter is a gigantic website that has millions of users. It is inefficient to read data from the disks for this large volume of user requests. To reduce the number of calls to the database, we can use a cache and the tweets can be provided much faster.

In a typical web application, we can add an application server cache and an in-memory store like Redis alongside our application server. When the first time a request is made a call will have to be made to the database to process the query. This is known as a cache miss. Before giving back the result to the user, the result will be saved in the cache. When the second time a user makes the same request, the application will check your cache first to see if the result for that request is cached or not. If it is then the result will be returned from the in-memory store. This is known as a cache hit. The response time for the second time request will be a lot less than the first time.

Types of Cache

In common there are four types of Cache…

1. Application Server Cache

In the “How does Cache work?” section we discussed how application server cache can be added to a web application. In a web application, let’s say a web server has a single node. A cache can be added in in-memory alongside the application server. The user’s request will be stored in this cache and whenever the same request comes again, it will be returned from the cache. For a new request, data will be fetched from the disk and then it will be returned. Once the new request will be returned from the disk, it will be stored in the same cache for the next time request from the user. Placing a cache on the request layer node enables local storage.

Note: When you place your cache in memory the amount of memory in the server is going to be used up by the cache. If the number of results you are working with is really small then you can keep the cache in memory.

The problem arises when you need to scale your system. You add multiple servers in your web application (because one node can not handle a large volume of requests) and you have a load balancer that sends requests to any node. In this scenario, you’ll end up with a lot of cache misses because each node will be unaware of the already cached request. This is not great and to overcome this problem we have two choices: Distribute Cache and Global Cache. Let’s discuss that…

2. Distributed Cache

In the distributed cache, each node will have a part of the whole cache space, and then using the consistent hashing function each request can be routed to where the cache request could be found. Let’s suppose we have 10 nodes in a distributed system, and we are using a load balancer to route the request then…

Each of its nodes will have a small part of the cached data.
To identify which node has which request the cache is divided up using a consistent hashing function each request can be routed to where the cached request could be found. If a requesting node is looking for a certain piece of data, it can quickly know where to look within the distributed cache to check if the data is available.
We can easily increase the cache memory by simply adding the new node to the request pool.

3. Global Cache

As the name suggests, you will have a single cache space and all the nodes use this single space. Every request will go to this single cache space. There are two kinds of the global cache

First, when a cache request is not found in the global cache, it’s the responsibility of the cache to find out the missing piece of data from anywhere underlying the store (database, disk, etc).
Second, if the request comes and the cache doesn’t find the data then the requesting node will directly communicate with the DB or the server to fetch the requested data.

4. CDN (Content Distribution Network)

CDN is used where a large amount of static content is served by the website. This can be an HTML file, CSS file, JavaScript file, pictures, videos, etc. First, request ask the CDN for data, if it exists then the data will be returned. If not, the CDN will query the backend servers and then cache it locally.

Cache Invalidation

Caching is great but what about the data which is constantly being updated in the database? If the data is modified in DB, it should be invalidated to avoid inconsistent application behavior. So how would you keep data in your cache coherent with the data from your source of the truth in the database? For that, we need to use some cache invalidation approach. There are three different cache invalidation schemes. Let’s discuss that one by one…

1. Write Through Cache

As the name suggests, the data is first written in the cache and then it is written to the database. This way you can keep the consistency of your data between your database and your cache. Every read done on the Cache follows the most recent write.

The advantage of this approach is that you minimize the risk of data loss because it’s written in both the cache and the database. But the downside of this approach is the higher latency for the write operation because you need to write the data at two places for a single update request. If you don’t have a large amount of data then it is fine but if you have heavy write operation then this approach is not suitable in those cases.

We can use this approach for applications that have frequently re-read data once it’s persisted in the database. In those applications write latency can be compensated by lower read latency and consistency.

2. Write Around Cache

Similar to the write-through you write to the database but in this case you don’t update the cache. So data is written directly to the storage, bypassing the cache. You don’t need to load the cache with data that wouldn’t be re-read. This approach reduces the flooded write operation compared to the write-through cache. The downside of this approach is that a read request for recently written data results in a cache miss and must be read from a slower backend. So this approach is suitable for applications that don’t frequently re-read the most recent data.

3. Write Back Cache

We have discussed that the write-through cache is not suitable for the write-heavy system due to the higher latency. For these kinds of systems, we can use the write-back cache approach. Firstly flush the data from the cache, and then write the data to the cache alone. Once the data is updated in the cache, mark the data as modified, which means the data needs to be updated in DB later. Later an async job will be performed and at regular intervals, the modified data from the cache will be read to update the database with the corresponding values.

The problem with this approach is that until you schedule your database to be updated, the system is at risk of data loss. Let’s say you updated the data in the cache but there is a disk failure and the modified data hasn’t been updated into the DB. Since the database is the source of truth, if you read the data from the database you won’t get an accurate result.

Eviction Policy

We have discussed so many concepts of caching….now you might have one question in your mind. When do we need to make/load an entry into the cache and which data do we need to remove from the cache?

The cache in your system can be full at any point in time. So, we need to use some algorithm or strategy to remove the data from the cache, and we need to load other data that has more probability to be accessed in the future. To make this decision we can use some cache eviction policy. Let’s discuss some cache eviction policies one by one…

1. LRU (Least Recently Used)

LRU is the most popular policy due to several reasons. It is simple, has good runtime performance, and has a decent hit rate in common workloads. As the name suggests this policy evicts the least recently used item first from the cache. When the cache becomes full, it removes the least recently used data, and the latest entry is added to the cache.

Whenever you need to add an entry to the cache keep it on the top and remove the bottom-most entries from the cache which is least recently used. The top entries are going to be maybe seconds ago and then you keep going down the list minutes ago, hours ago, years ago, and then you remove the last entry (which is least recently used).

Consider the example of any social media site, there is a celebrity who’s made a post or made a comment and everyone wants to pull that comment. So you keep that post on the top of the cache and it stays on the top of the cache depending on how latest the post is. When the post becomes cooler or people stop looking or viewing that post, it keeps getting pushed at the end of the cache, and then it is removed completely from the cache. We can implement the LRU using a doubly-linked list and a hash function containing the reference of the node in the list.

2. LFU (Least Frequently Used)

This policy counts the frequency of each requested item and discards the least frequent one from the cache. So here we count the number of times a data item is accessed, and we keep track of the frequency for each item. When the cache size reaches a given threshold we remove the entry with the lowest frequency.

In real life, we can take the example of typing some texts on your phone. Your phone suggests multiple words when you type something in the text box. Instead of typing the whole word, you have the choice to select one word from these multiple words. In this case, your phone keeps track of the frequency of each word you type and maintains the cache for it. Later the word with the lowest frequency is discarded from the cache when it’s needed. If we find a tie between multiple words then the least recently used word is removed.

3. MRU (Most Recently Used)

This approach removes the most recently used item from the cache. We give preference to the older item to remain in the cache. This approach is suitable in cases where a user is less interested in checking out the latest data or item. Now you might be thinking that most often users are interested in the latest data or entries so where it can be used? Well, you can take the example of the dating app Tinder where MRU can be used.

Tinder maintains the cache of all the potential matches of a user. It doesn’t recommend the same profile to the user when he/she swipes the profile left/right in the application. It will lead to a poor user experience if the same profile will be recommended again and again. So tinder removes the profile from the cache which is observed most recently i.e. either left/right-swiped profiles.

4. Random Replacement

As the name suggests we randomly select an item and discard it from the cache to make space whenever necessary.