Understanding Caching- Eviction, Invalidation , Patterns and their Use Cases

Surabhi
6 min readSep 21, 2023

--

What is cache ?

In simple words, a hardware or software component that helps in serving the data ,which is either frequently requested or which is computationally expensive to retrieve is called cache . Cache stores such data and helps to render whenever requested. Usually , this data is stored in the form of key -value pair, where key identifies the item uniquely in the cache .

Caching is done in many common scenarios to speed up the process of retrieval , optimize performance and enhance user experience. Most of the systems/applications maintain some sort of cache.

In Absence of Cache

A client(e.g a browser) sends a request to the server for fetching some data ,server in turn fetches the data from backing database/datastore. Now , suppose user again sends same request(for same data), entire flow is repeated — including computation/network heavy database read. So, in order to optimize this process , we put a cache in between , cache is something which is closer to the application/server than the database. Cache can be part of application/server, in memory store or anything which has lower network cost than it takes to access the database.

After putting cache in between

The second or the subsequent requests for the same data doesn’t go to database, but it is fetched from local cache/low latency cache and returned back.

Cache hit and Cache miss

If a request is sent to the application and the data is present in the cache, the application reads the data from the cache and returns , it is known as cache hit.

However if the data is not present in the cache, the application has to fetch the data from the database. This situation is called as cache miss.

The above diagram shows a very basic set up for the caching, we will go into the more advanced patterns in details further in the article.

Eviction and Invalidation — Volatile nature of cache

Cache Invalidation — Once a value is loaded into a cache corresponding to some key, its quite possible that after some time , someone/some process would have updated its value in the database, so it needs to be updated or invalidated from the cache too.

How do we invalidate cache ? One way is to keep an expiry time for cache items known as TTL or time to live . How do we determine appropriate ttl ? This is not deterministic , it varies from case to case and to determine an approximate ttl , we should consider the tradeoff between the cache performance and stale data that can be accepted by the system. Cache would not make sense if it needs to be refreshed too frequently.

Cache invalidation can also be done by using application code whenever some data is created or updated in DB, but it again varies from case to case . Hybrid approach for cache eviction is also possible where we use a combination of both approaches.

Cache Eviction — Caching enhances efficiency of read heavy operations but it comes with a cost . We can cache only limited number of keys in cache at one point of time. So, whenever we have a new key that needs to be added to the cache , if it has reached its limit , there must be some other key that should be evicted from the cache.

Few strategies for cache eviction :

First In First Out : The key which has come first in the cache leaves the cache first

Least Recently Used: The key which has not be used in longest time , leaves cache.

Least Frequently Used : The key which has the lowest frequency of usage goes out.

In addition to above mentioned strategies which of course varies from case to case , it is also possible to maintain our own eviction policy(based on some customized algorithm which can combine any of these or implement custom rules to suit the usecase). For example there could be caching eviction policy which only keeps the most frequently used ,that too in recent past — similar to any viral post or any hot products on sale.

Cache Patterns/Strategies

These are the some generic strategies in which the cache are used in real world application:

  1. Cache Aside Pattern : This is the cache placement pattern when cache does not interact with database directly.

This pattern supports heavy reads. The application works even if the cache goes down for some reason(it fetches directly from DB though).However to maintain a synchronization between the DB values and cache values , some mechanism like ttl/application logic has to be in place.

2. Cache Read Through Pattern and Cache Write Through Pattern : Usually these two patterns are used together in the systems. In these patterns , the cache is placed in between the application and the database.

These pattern together can be great option for read heavy workloads , ex- reels, newsfee, etc.However , one thing that needs to be kept in mind here that data model for both cache and databse should be similar. Usually third party libraries/systems are used as cache in such scenarios. Another disadvantage here is failure of caching layer can bring entire system down .Also , it adds some latency while writing to the DB as there is cache in between. So, in order to overcome this issue there is a similar pattern with some deviation that we will go through in the next section.

3. Cache Write Around Pattern — In this pattern , all the set up remains same as above , only the write is done directly to the database. It helps minimizing the write latency as mentioned in the above case.

4. Cache Write Back Pattern — This pattern is suitable for write heavy patterns , where a quick response is needed at client side. So, all the writes are written quickly in the cache and response is rendered back to the client. Later , all these write data are batched and written back to the database. This pattern is write efficient approach, but again failure of cache leads to loss of all the write data.

Which Caching Pattern is the best one?

The answer to this question is none. The applicability of these patterns varies from case to case. We should also give some thought around — Which is more important ?cache sufficing in case if DB is down or system failure in case if cache is down, can the system have similar models for database or cache or different models for these modules? These are just few points, there are many more that should be discussed while choosing and designing cache for the application.

--

--

Surabhi

Technology enthusiast and keen learner , exploring and experiencing life as it comes. Firm believer of silver lining in the dark cloud :)