[System Design] Caching — In a Nutshell(with OOP code example)
A Real-World use case.
Imagine, you went to a pizza shop to eat a pizza.
Now, you ordered it and waiting for the pizza to come. The shopkeeper came with the pizza, gave you some ketchup sachet, oregano, etc and you started eating it because you are very hungry.
After few slices, you realized that the ketchup got finished. You asked the shopkeeper for more ketchup. the shopkeeper went to the kitchen and bring you more ketchup sachet.
Now, again you resumed eating and are about to finish the pizza. Now, one slice is left and you realized that again the ketchup is over. You might think that you should eat it without ketchup but “Craving is Craving”, so you again called the shopkeeper for ketchup.
The shopkeeper started getting annoyed.
He again went to the kitchen and got you few more ketchup sachets. But, he realized that he can’t leave the counter every time just to get sachets, and it is causing a bad customer experience.
He thought if he places all the ketchup sachets at the counter, people can take them directly instead of asking him for more. So, he went to the kitchen and took some ketchup sachets, and placed it on the counter.
And announced, “ketchup sachets are available here. People can take it whenever needed.”. He also kept other items like oregano packets etc too. Now, nobody needs to ask him for any extras. They can directly take items from the counter.
And the shopkeepers can keep on attending to new customers and other stuff without worrying, which customer needs ketchup/oregano/etc. He just needs to take care of filling the ketchup/oregano at the counter, whenever it gets empty.
What did the shopkeeper just do?
He made the frequently asked items, access to everyone, and saved his time too, which is being consumed in taking items from the kitchen to counter. This is nothing but caching. If you still can’t relate, here is the analogy:
- Customers are nothing but Clients
- the shopkeeper is nothing but an app server,
- Kitchen is the Database
- Ketchup/oregano are nothing but data that is being requested frequently.
- and the tray in which Ketchup/oregano have been placed is nothing but a cache.
The app server (shopkeeper) takes care of filling the cache(counter) with the frequently requested data (ketchup/Oregano) from the database(kitchen) in order to reduce the cost of database calls (taking items from the kitchen) and maximizing its performance(attending new customers and serving orders).
So, What is Caching?
Caching is a technique that stores copies of frequently used application data in a layer of smaller, faster memory called cache, in order to improve data retrieval times, throughput, and compute costs.
How cache memory is different from normal database storage?
Cache memory is generally very small in size as compared to normal storage. The reason is, it is used to stores certain data, generally, the most frequently used, and since its size is small, the search operations take very little time, as compared to normal database search time. That’s why it is faster than any other secondary storage.
Imagine, if you will be able to save everything in the cache, then there will be no difference between normal memory and a cache because the search time becomes equal.
Let’s design a cache (OOP Example)
A typical cache (in-memory) is nothing but a data structure to store certain data. You can use any data structure that takes less time for the below operations:
- Add item in the cache
- Get an item from the Cache
- Delete item in the cache
Let’s use hashing and take a “MAP” having O(1) search, insert, and deletion time.
The above code looks good. But there are below issues.
- Since we have a limited capacity (size = 100), as soon as the cache becomes full, it won’t save any new items. we need to free some space.
- if we want to delete items from the cache, how will we decide, which one to delete because it is possible that the deleted item might be requested in the database access?
In order to solve the above issue, we use something called, The cache replacement policy which is an essential part of the success of a caching layer.
The cache replacement policy.
A cache replacement policy (also called eviction policy) decides what memory to free when the cache is full.
A good replacement policy will ensure that the cached data is as relevant as possible to the application, that is, it utilizes the “principle of locality” to optimize for cache hits. Replacement policies are fine-tuned to particular use cases, so there are many different algorithms and implementations to choose from. Some popular ones are:
LRU — least recently used
In an LRU replacement policy, the entry in the cache that is the oldest will be freed. LRU performs well and is fairly simple to understand, so it’s a good default choice of replacement policy.
In order to program such policy, we need to track the least recently used item. Here the assumption here is, We keep on adding the new item at the end of the cache, so that, cache will contain the items in order of less recently used to most recently used. In this way, we can remove the first elmemnt eveyrtime, whenever we hit the capacity.
In this example, we are using a LinkedHashMap that maintains the insertion order of elements.
LFU — least frequently used
In an LFU replacement policy, the entry in the cache that has been used the least frequently will be freed. Frequency is tracked with a simple access count in the entry metadata.
Simillar to LRU, we need to track the least frequently used item. Here the assumption is if the item is getting referenced or newly added, it becomes frequenty and should not be replaced. So, instead of adding item at the end, only in the add function, we need to modify the item’s postition in case of Get function too.
LFRU — Least Frequently Recently Used
It takes into account both recency and frequency by starting with LFU and then moving to LRU if the data is used frequently enough.
There are other policies too like FIFO, LIFO, etc. But LRU has been the most used policy.
So, Are we done? Not yet…
We discussed the replacement policies, that’s fine. But still, we have not discussed an important aspect of storage called Data Persistence.
The cache saves a copy of the data from the database, and it is transient storage that has to be in sync. it can’t save anything which doesn't persist in the database.
In order to keep data in sync, either we move data from database to cache OR from cache to Database.
There are some techniques used for data movement called write policies.
Cache writing policies
A cache’s write policy is the behavior of a cache while performing a write operation.
Before diving into the policies, let’s add the below functions to the existing class.
- for Single update
2. for the bulk update.
A write-behind cache writes first to the cache, and then to the primary datastore. This can either mean that the primary data store is updated almost immediately, or that the data is not persisted until the cache entry is replaced, in which case it’s tagged with a dirty bit to keep track that the data is out of sync.
It is possible that at the moment a write-behind cache fails that some data has not been persisted, and is lost. Ultimately writes will be fastest in a write-behind cache, and it can be a reasonable choice if the cached data tolerate possible data loss.
In order to implement, write-behind, we need to update DB whenever there is an addition or deletion.
In a write-around cache design, the application writes directly to the primary datastore, and the cache checks with the primary data store to keep the cached data valid. If the application is accessing the newest data, the cache might be behind, but the writer doesn’t have to wait on two systems being updated and the primary datastore is always up to date.
In order to implement, write-Around, we need to update DB whenever there is an addition or deletion.
Finally, a write-through cache updates both the cache data and the backing data at the same time. If the cache layer fails, then the update isn’t lost because it’s been persisted. In exchange, the write takes longer to succeed because it needs to update the slower memory.
Ideally, the cache doesn’t update the database instantly (as shown in the above examples) because of the overhead of network calls and processing delay. Since the cache is getting referred very frequently, a database update will add a delay to it and can cause locking issues too. Generally, It is updated periodically, or on-demand. We can have separate daemon processes that will update the databases.
So far, we have understood much about caching, Still, we have missed another key aspect.
Expiration policy — TTL
The Expiration policy or retention policy makes sure that there is always space in the cache for new data. This parameter specifies an amount of time after which a resource is freed if it hasn’t been used, called the Time To Live (TTL). Finding the right TTL is one way to optimize a caching layer.
Are we done? Almost…
We understood, what is caching, its replacement policies, write policies, etc, But,
how do you decide, should you use a cache or not?
When to use caching?
- Caching is most helpful when the data you need is slow to access, possibly because of slow hardware, having to go over the network or complex computation.
- Caching is helpful in systems where there are many requests to static or slow to change data because repeated requests to the cache will be up to date.
- Caching can also reduce the load on primary data stores, which has the downstream effect of reducing service costs to reach performant response times.
When not to use caching?
- Caching isn’t helpful when it takes just as long to access the cache as it does to access the primary data.
- Caching doesn’t work as well when requests have low repetition (higher randomness), because caching performance comes from repeated memory access patterns.
- Caching isn’t helpful when the data changes frequently, as the cached version gets out of sync, and the primary data store must be accessed every time.
Are we done? Not yet…
We discussed everything here, in the context of a single application server.
But what if you are working in a distributed system, that means, multiple application servers are there. How will you implement a caching layer?
In a distributed system a caching layer can be implemented in the below ways,
A private cache or local cache exists in the memory of each application server. Since each machine stores its own cache, the cached data will get out of sync based on what requests have been directed to that machine, and at what time.
The main advantage of in-memory distributed caches is speed — since the cache is in memory, it’s going to function a lot faster than a shared caching layer that requires network requests to operate.
The major disadvantage is the scaling of caching layer. you need to scale individual cache memories which will be cost extensive.
Redis and Memcached are two common choices for private cache implementations.
A shared cache exists in an isolated layer. In fact, the application servers might not be aware that there’s a cache. This allows the cache layer and application layer to be scaled independently, and for many different types of application, services to use the same cache.
The application layer needs to be able to detect the availability of the shared cache and to switch over to the primary data store if the shared cache ever goes down.
A shared cache implemented on its own cluster has the same resiliency considerations as any distributed system — including failovers, recovery, partitioning, rebalancing, and concurrency.
Which design should be selected and why?
Both the designs have their own pros and cons.
A combination of both is actually the best design. The private cache can also be a fallback that retains the advantages of caching in case the shared cache layer fails. So, If we can implement both, that is always the best option.
Still, if we need to decide one of the approaches, It would depend on the following assumptions,
- Go for private cache: if the system is small, needs less maintenance, and is not going to scale much.
- Go for shared cache: if the system is big, and needs Maintainance.