Catching up with cache !

11 min readJun 6, 2022

What is Cache?

I am pretty sure that this is not the first time you are hearing or seeing this word — cache. In fact we see this word being used at multiple places in our day to day life. Examples — Browser cache, App cache. In fact, In DSA Memoization is also a type of cache.

But what is cache? As per wikipedia — a cache is a component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.

Simply put, Cache is a way of storing frequently demanded things closer to those asking for it.

Ok. Now we know what is cache but we still don’t know why do we need it and what does this have to do with software engineering and backend?

Why Caching?

As said in the wikipedia definition — “so that future requests for that data can be served faster”.

Well that is it. We need cache, so that our APIs can respond faster.

Albeit that is the most popular & important use-case of cache but that is not the only one. Some other reasons for using cache includes —

Performance — Performance is improved by making data easier to be accessed through the cache and reduces workloads for database.

Scalability — Workload of backend query is distributed to the cache system which is lower costs and allow more flexibility in processing of data.

Availability — If backend database server is unavailable, cache can still provide continuous service to the application, making the system more resilient to failures.

Nowadays, we also use cache to solve our business problems where transient data is required.

What to Cache?

Well now we know that we need caching to server our requests faster. So should we cache everything? Obviously NOT..

Technically we can cache everything but that has some costs associated with it — both monetary and performance.

As a general rule — “Cache it if you are going to need the data again in a short interval of time” or “if the data is not going to change for a long period of time”.

Examples:-

Homepage
Raw data (non-user specific) without personalisations

It is also important to know what shouldn’t be cached —

As a general rule — “Don’t cache if cache hit ratio is very less”.

Cache hit ratio is ratio of how many requests a cache is able to fill successfully and how many requests it receives

Examples when we shouldn’t use cache—

Low cache hit ratio: when data is requested using filters and filters have very high cardinality (example: Price Range filter)
High cardinality: User personalised search results. In this case the cache hit might be high if the user comes to the app often but the total number of cache keys that would be required would be very large.

Where to Cache?

In a backend system, there are multiple places where we can cache data. The choice of which depends upon the type of data and usecase. For example —

Front End cache is used to store the assets and API responses that don’t change much. This cache is present in your browser.
CDN (content delivery network) is used for caching front end assets like images, fonts, javascripts, css etc
Nginx(reverse proxy) is used to cache HTTP responses.

Nginx cache is not distributed and harder to manage and invalidate, and therefore not used that much

In-memory cache VS Centralised cache

In-memory cache: In case of small, predictable number of objects that have to be read multiple times, an in-process cache is a good solution because it will perform better than a distributed cache. Example: Guava Cache

Centralised cache: In cases where the number of objects that needs to be cached is unpredictable and large, and read-consistency is a must in between servers. Even if 100 percent read consistency might not be required, there might be cases where we want a single source of truth. Not same performance benefits as an in-process cache. Example: Redis, Memcached etc.

It goes without saying that your application can use both schemes for different types of objects depending on what suits the scenario best.

Caching Strategies

Read

Cache Aside

Read-Through

Write

Write-Through

Write-Behind

Read + Write

Write Around + Cache Aside

Write Around + Read Through

Cache Updation

There are also multiple methods of refreshing the cache
- Refreshing on TTL expiry
- Refreshing on DB update

The second method is also called cache invalidation and is probably the best one for most use cases.

Redis (https://redis.io/)

Redis is an open source, in-memory data structure store, used as a database, cache, and message broker.

Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries.

Why Redis?

Low Response time: Keeps data in-memory instead of disk or SSD. So retrieval is fast.
Data persistence: Redis can provide persistence in case of process outages or network bottlenecks. Redis takes regular snapshots of data and appending them with changes as they become available.
Redis can be configured to generate backups on-demand or after regular intervals of time to ensure database durability and integrity.
Data Expiration: Redis allows us to set a TTL (time to live) for the data.
Battle tested: Already being used at high scale by companies like Twitter, Github, Stack overflow etc.

Redis Deployment Configurations

Standalone

There is only 1 server node. If that node goes down, cache is down.

Master Slave

Redis server can operate in one of 2 modes — as a master node, or as a slave node.

The master slave replication is done asynchronously.
Recommendation: Serve writes through Redis Master and reads through Redis Slaves.
By default, all redis servers are master nodes. We need to explicitly specify if we need a node to work in slave mode.

Now the next logical questions would be — what if master goes down? what if slave goes down?

Master node failure: Well, there are 2 options — we can add a new master node OR we can promote an existing slave node to be master.

Here, the second option is better than first as in the first one, the newly added master node will not be having any existing data. Whereas in the second option slave already has almost all of the data.

Slave node failure: If one of the slaves go down, we have other slaves that can server data.

Redis Cluster

Redis scales horizontally with a deployment topology called Redis Cluster.

Redis Cluster provides a way to run a Redis installation in such a way that data is automatically sharded across multiple Redis nodes.

Sharding is the method for distributing data across multiple machines.

Redis cluster also provides us with a degree of availability i.e. it continues to function if some of the nodes in the cluster goes down. However, it stops working in case of a large failures (when multiple master nodes are down).

For a cluster to work properly, we need to open the 2 TCP ports for each node in cluster-

6379: So that clients can communicate with nodes
16379: For inter-node communications.

Redis Cluster does not use consistent hashing, but a different sharding technique where every key is conceptually part of a hash slot.

There are 16384 hash slots in Redis Cluster, and to compute the hash slot for a given key, we simply take the CRC16 of the key modulo 16384.

Every node in a Redis Cluster is responsible for a subset of the hash slots.

For both adding and removing nodes, we only need to change the hash slots associated with each node.

Moving hash slots does not require stopping any service; therefore, adding and removing nodes, or changing the percentage of hash slots held by a node, requires no downtime.

Redis Cluster with master-slave replication

This deployment configuration is used when we want to be available even when a subset of master nodes are down. Here, even if Node B master fails, the cluster can continue to serve requests from Node B replica 1 or Node B Replica 2.

The cluster will go down, if all the nodes for a hash slot are down.

Creating and Using a Redis Cluster: https://redis.io/docs/manual/scaling/#creating-and-using-a-redis-cluster

Twemproxy — Client Side Sharding in Redis

https://github.com/twitter/twemproxy

Twemproxy (aka nutcracker) is a fast and lightweight proxy developed by twitter before redis cluster was a thing.

It was built primarily to reduce the number of connections to the caching servers on the backend. This, together with protocol pipelining and sharding enables us to horizontally scale our applications distributed caching architecture.

Twemproxy support consistent hashing with different strategies and hashing functions.

Time to Live (TTL) for Cache

TTL for a cache determines, for how long a particular data needs to be cached and its value varies from usecase to usecase.

Low TTL will remove the key from cache more frequently, resulting in more number of requests to the underlying system. Low TTL can reduce the chances of inconsistencies but can result in high latencies and reduced performance.

Large TTL on the other hand will keep the system performant but increases the chances of inconsistencies.

TTL with Jitter
This is a very common technique that is used in case where we cache multiple items at once. If we were to add a fixed TTL for all the keys (lets say 10 minutes) then all the keys will expire at once, resulting in large load on the database at once.

Instead, we add a delta to the Cache Keys’ TTL so that not all the keys expire at once.

Redis Eviction policies

The exact behaviour that Redis must follow when the maxmemory limit is reached is configured using the maxmemory-policy configuration.

The maxmemory configuration directive configures Redis to use a specified amount of memory for the data set. You can set the configuration directive using the redis.conf file

The following policies are available:

noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true
volatile-lfu: Removes least frequently used keys with the expire field set to true
allkeys-random: Randomly removes keys to make space for the new data added
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes least frequently used keys with expire field set to true and the shortest remaining time-to-live (TTL) value.

Cache Warmup — solution

Cold Cache — problem
This is a problem that you can face with both In-memory and Centralised cache. The problem is the time interval during which the cache is yet to be populated, so the cache hit ratio is very less.

To solve this problem, we simply update the cache node with data before it starts serving online traffic with the help of startup scripts or schedulers. To avoid all cache miss at once, we use TTL with Jitter is recommended.

Achieving consistent behaviour using redis

Redis can also be used to store transient data that is required within subsequent requests to ensure consistent behaviour.

Example
Consider a booking flow where the price is being calculated on the basis of many dynamic inputs that can change anytime. Now, the booking flow can consist of multiple APIs that are needed to be invoked serially.

Our expectation here is that the user should be given a consistent price on all the pages and since the price calculation is dynamic, we somehow need to store the price that was shown to the user for this specific session.

We can use cache here to store the price for that session in the first API call and then use the same data in all the subsequent requests.

Error Handling

Cache plays an important role when we are building applications at scale as it helps us drastically improve the performance of our system.

The important thing to question here is — what if the entire cache cluster is down?

Consider an application that is heavily dependent on cache and the cache hit percentage for the service is 90 percent. Meaning that 90 percent of the overall traffic is being handled by the cache and 10 percent is being handled by the database.

Now let us say the redis cluster goes down or the application is unable to connect to the cluster. In this case how should the application behave? There are 3 options —

Option 1: Retry the requests
Option 2: We can ask the database for the data and continue serving the request
Option 3: We can terminate the request and give error to the client

In case the error is due to some temporary network fluctuation, with Option 1 we will continue to serve traffic.

Option 2 seems like a good option but in fact is a Devil in disguise. It can do more harm than good. Why you ask?

If cache is indeed down and we call the database with all that traffic that was handled by cache. There is a 90 percent increase in the traffic to the database in a short span of time. This spike traffic could easily choke the database resources and can lead to failures in other systems.

So, it is always crucial to think about the error handling in case of cache down or any other system for that matter.

Liked what you read? Motivate me to write more by upvoting (clapping) this article. Thanks!!

You can follow me on LinkedIn by clicking here

Resources for further reading

Try Redis: https://try.redis.io/

The Little Redis book: https://www.openmymind.net/redis.pdf

Scaling Memcached at Facebook: https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf

Scaling Redis using Redis Cluster: https://redis.io/docs/manual/scaling/