Caching Strategies In Practice.

Rajeev
8 min readJan 15, 2023

--

Caching is an important aspect when user base grows and you want to give seamless experience to your end user.

In the current competitive market responsiveness of your application is very important. as Developers, apart from the functional requirement, we should also ask for non-functional requirements which will help, in order to estimate the scale which you will be required on the day of launch.

When an application is slow there are most probably two things that can happen

  • Slow Query
  • No Cache Implementation

We will discuss the second point as the first point requires a different aspect of solving a problem which I will cover in my next blog.

For now, assumed, you did everything to optimize the query and still, performance is not what you want to achieve for the application then caching comes in rescue.

The cache is Hardware/software which provides the storage of data and faster retrieval of data compared to fetching the data from traditional storage devices.

we have different levels of cache in the Application.

Cache Level

Client Side Cache:

Client cache is the most efficient type of caching because it allows browsers to access files without communicating with the web server. We mostly save static assets of Applications on the client side so we don’t have to fetch them again and again from the server.

Infra level Cache(Server side Cache):

In the Infrastructure level cache, we used to cache the data of applications on the infrastructure layer so that it should not reach to Service every time. This is also used to save some static content like images and promotional videos of the Application.

Backend Cache (Server side Cache):

In the Backend cache, we used to implement the cache on the API level, where based on query param we save the result of API so when next time some one request with the same criteria we should be able to retrieve it from the cache without going on storage devices like Database or NoSQL.

Cache can be implemented in two way

  • In memory
  • Distributed

→ In memory cache:

In memory cache implementation, we used to implement when we were having a monolith application where you have only one instance of your service running. In memory cache implementation is not the right choice for Microservice Architecture or for Applications which will run on multiple instances on the server. Because each instance will have its own space for saving the data, so you will mostly face cache misses in this case.

→ Distributed caching:

Distributed caching is an important aspect of cloud-based Applications. it helps to scale your application in incremental order. The distributed cache will solve the problem where all of your instances of service will communicate to a common cache.

There are two important dimension of implementing the cache

  • How to cache the data (here we have to check and validate how many time this data changes).
  • Define the eviction policy of Cached Data.

There are two type of Data which we used to cache

  • Transactional Data
  • Master Data

Here transactional Data is data which changes frequently and master data is a kind of configuration data which will not change frequently .

so implementing the cache on master data will be easy as this will be not likely to change on daily basis so for this kind of data you can define time to live(TTL) and the cache will be refreshed based on these defined TTL.

Implement cache on transactional data is critical , here you have to think like how we are fetching the data and data might be getting created per second but it might be possible that data is not created for all user on each second.

Implement cache on transactional data need to follow below aspect

  1. How many time user creating the transactional data.
  2. How many time user accessing the data in day.
  3. what kind of user base will going to access the data.

Below diagram can explain data and its relevancy for different type of users.

In the above Diagram, we can see each user creating the data and you can identify the data for each user as an inner circle. Each user will only be going to see their Data but we have another user called admin who has access to all the data.

So in the system, where you have maintained the pagination of data to show on UI, also you need to fetch the data based on user type . if the admin wants to see the data we need to show them all the records of the system. but if a specific user wants to see the data we need to show data which belongs to the user.

Here we have two challenge

  1. If user is not creating the transaction we should be fetching from cache and showing from their always
  2. if specific user has created any transaction in system them only that specific users cache should be refreshed instead of refreshing for all .
  3. if any user is creating the transaction cache of Admin user need to refresh as he entitle to Get All the data from system.

Now to maintain the above expectation what we need is to have cache region for each user and if a user is modifying the data we should be evicting the cache for that user and evict the cache for the admin user (as the Admin user is entitled to see all the data otherwise he will see old data of that user).

So Implementing or Adding the cache layer on the database was easy but now we have to maintain the life cycle of the cache region.

Evicting cache will going to cost you for performance so we need to define or choose the optimal cache strategy for our application so that we don’t end up saving unnecessary data in the application, or we don’t end up reading stale data in the application .

Below are cache strategies which we can use based on a use case to use case.I will give an example of a few if your use case fall in those may be you can try using those strategy.

  1. Cache -Aside (Lazy cache or On demand Cache)
  2. Read Through Cache.
  3. Write Through Cache.
  4. Write Behind Cache.

Cache -Aside:

Cache Aside is also called a lazy caching mechanism or on-demand cache strategy. This is a widely used cache mechanism.

In this pattern following steps are performed by application

  1. check in cache if data is available with key.
  2. in case of cache miss , data gets fetch from database and added back to cache with key.
  3. Return data to user next time if user query for data its will get fetched from cache.
Cache Aside

Pros:

  • work best for read-heavy workloads.
  • you will cache only data which is requested by user so your cache will not have data which is not getting query by user.
  • Even cache layer is down your application will still work.
  • On demand Query can be saved easily.

Cons:

  • As data will be cache against the key if there is any change in criteria then cache need to be evicted.
  • The first response might be slower for the user and if your query is too heavy then this will be the worst experience for the user the first time they may be not able to get data.

Read Through Cache:

Read-through cache sits in-line with the database. When there is a cache miss, it loads missing data from database, populates the cache and returns it to the application.

Read Through

Both Cache-Aside and Read through use lazy loading i.e fetching the data only when it’s requested. but the only difference is, in this approach, you have a separate service to fetch the data from the cache or common lib to fetch the data from the cache.

Pros:

  • work best for read-heavy workloads.
  • Provide better tracebility for cache.

Cons:

  • Single point of failure as if cache layer down you application will not work.
  • Data model will be same as database model which we want to save in Cache layer
  • For each cache miss , you need to plan Read stratergy for different type of model.

Write Through Cache:

In the write-through cache mechanism, data is first written to the cache and then written to the database. so all the reads always go through the cache and there will be no cache miss. this helps in maintaining consistency with the master database and avoids stale data read for application.

Write Through

Pros:

  • Eliminate stale data problem
  • All read query always go to Cache
  • tracing of cache miss and hit will be easy for data model

Cons:

  • if write data operation fail in cache , application will lose the critical data
  • if cache layer need to be highly available , if cache layer goes down application can’t accept any read or write operation
  • Cache might be saving the data at cache layer which might not require cache implementation
  • Read stratergy need to define based on per Model.
  • Application latency will be increase for all the write operation as Service has to update on cache layer then database layer.

Write Behind Cache:

In the write-through cache mechanism, data is first written to the cache and then written to the database and these operations are performed by the application synchronously. Adding/updating the cache synchronously is defeating the purpose of reducing the latency of the application as writing now will more time than expected.

In write Behind cache , model first gets updated in cache and then Database layer gets the data asynchronously.

Pros:

  • Best for application which is write heavy .
  • writing to cache will improve the performance of write operation in application
  • its facilitate to use batching on database , which will improve you database operation throughput.
  • Most of SQL (MySql InnoDB/ DynamoDB:DAX ) uses this mechanism for their internal processing and writing to database.

Cons:

  • if cache layer down Application will down as well
  • Risk of data lose if any cache node restart without flushing the request to database.

So my final take from the above analysis, most of the developers choose Cache Aside as it’s less overhead and there is no data loss which is critical for the application. But you can use Cache-Aside and Write behind combined. Cache Aside we can use for applications where data loss is not acceptable and then mix Write Behind where you can use some temporary storage to track requests (Kafka). this way your write and reading both will be optimal.

--

--