TL;DR: This post aims at describing the architectural changes we did in the implementation of Redis for caching at infrastructure and code level over a period of time due to scale and use-cases.
When we first started we had only one instance of Redis on a server having 7GB of RAM. Only a few of our API’s at that time used Redis for caching. On one of our marketing heavy day, we got a notification that our Redis server crashed.
During debugging we realised that the Redis memory usage went up to 70%-80% and crashed on a BGSAVE operation. We were using BGSAVE operation to persist data in RDB files.
BGSAVE operation forks a new process which requires approximately the same amount of memory that is currently utilised by Redis instance.
As the memory consumption of our server was around 80%, it didn’t have much memory to execute the BGSAVE operation and thus it crashed. One solution we tried was tuning vm config.
vm.overcommit_memory = 1
Setting over commit memory to 1, makes kernel pretend that there is always some memory present to execute the action like malloc. Here the underlying concept is of virtual memory. Programs see a virtual address space that may, or may not, be mapped to actual physical memory.
But here our usage was still hitting the total RAM available, so we needed to upgrade our RAM and find a solution keeping in mind the above problem.
For temporary fix, we increased the RAM from 7GB to 15GB
At this phase, our scale started to grow rapidly which inevitably increased the cache usage. Caching was now required to be reliable as few minutes of caching outage could reduce the speed of our API’s which in turn could impact app performance, user experience and critical services like SEO.
Few key changes we incorporated in this phase:
a. Upgrading server RAM
Scaled up our Redis server from 15GB to 30GB of RAM.
b. 32-bit Redis Instance
A 32-bit compiled Redis instance uses very less memory per key because the address pointers are small. One caveat is that it is limited to 4GB of memory as it is 32-bit. You can read some more memory optimisations here redis-optimisation.
c. Multiple Redis Instances
Rather than running only one instance, we now installed multiple Redis instances (6 instances with maxmemory of 4GB) capped with maxmemory option in Redis conf. This helped us in
- Solving BGSAVE problem — Each redis instance was capped to 4GB of memory, so at any point of time our max usage can go upto 24GB (6 instances * 4GB).
— We now used to trigger BGSAVE for one instance at a time. So at any point of time, we had around at least 6GB of memory available for BGSAVE to run. ( 30GB RAM-6 * 4GB of Redis instance usage)
— Here we kept 2 GB as buffer to allow other OS related operations. (As BGSAVE per Redis instance needed 4GB of extra RAM)
- Better utilisation of resources — As Redis is single threaded, with multiple instances we were able to better utilise the system resources without much overhead.
Now, as we had multiple Redis instances we needed a way to distribute data among them. After some research, we decided to use Twemproxy.
Twemproxy is a twitter open source project, which helps in sharding data across multiple Redis nodes through consistent hashing.
Third Phase development was triggered due to two factors: Phase 2 issues and the advent of our micro-services architecture.
Phase 2 Issues & solutions
a. Out Of Memory issue
As the cache usage grew we started to get OOM issues in multiple Redis instances periodically.
After debugging, we realised whenever the usage of any Redis instance reached around 3GB, it crashes. Further debugging brought us to the conclusion that:
Recommended usable memory of 32-bit Redis instance is 3GB but it starts to give OOM issue even around 2.5GB of usage. This is due to memory utilised by other things like Redis own internal buffers. <stackoverflow>
Solution: To avoid OOM issue we replaced 32-bit Redis with latest 64-bit Redis. Also, 64-bit Redis can take advantage of OS level settings like vm.overcommit_memory, swap space etc.while 32-bit instance could not.
b. Twemproxy — Loss of data & outdated
We use consistent hashing algorithm in Twemproxy to shard data among multiple servers
In consisting hashing, addition or deletion of a node leads to loss of k/N keys where k is number of keys & N is total number of nodes.
Twemproxy doesn’t provide solutions to reshard data on addition or removal of Redis node.
Also, Twemproxy was not in active development and its last commit was more than an year ago.
Solution: Around this time stable version of Redis Cluster was available. We then shifted to Redis Cluster as it helped us remove the third party dependency i.e Twemproxy. Also, it provides a script redis-trib through which we can automate the re-sharding process.
As we kickstarted our micro-service journey we needed to keep few things in mind:
- Code redundancy : The connection logic and basic functionality of cache were needed to be abstracted out as multiple services will start using it.
- Inconsistent Data : Some teams used to compress data before adding it to cache, this helped us save memory. As it was not enforced, other teams were not following this.
- Cache-Key format : Until now we didn’t have any fixed cache key format. It used to get difficult to identify specific or a set of keys.
- Critical services : We needed a way to keep the cache highly available for performance critical services.
- Tracking and metrics : Some system level metrics were already tracked, but those did not suffice. We needed better insights for understanding the cache usage and performance in terms of different micro-services. This was also needed to prevent teams from exploiting the cache.
The Final Architecture
On the server side, we launched two Redis Clusters HA(High Availability) and LA(Low Availability).
a. HA cluster
We utilised master-slave concept provided by Redis Cluster to have high availability. Currently, we have 7 masters in this cluster and each master has one slave.
Performance critical services utilised HA cluster.
b. LA cluster
This cluster consists of only 4 master Redis instances. The services utilising this cluster are those which won’t get affected if there is an outage. Though there hasn’t been a single instance of LA cluster going down.
On client side as our micro-services were to be made in NodeJs, we decided to make an internal library in NodeJs named Flash . This solved our majority of the use- cases/issues of micro-services architecture.
- Flash — A singleton pattern based library which internally maintains a single connection to Redis Cluster. This helped us maintain the number of connections to Redis Cluster.
- Cache Key — In Flash, we introduced a concept of bucket name. Each developer had to register a bucket name when using cache. Ex. All cache keys for SEO come under bucket name seo_bucket. The format for the final cache key that gets stored in the cache is bucket_name:cache_key.
- Data compression — Flash internally compressed data using snappy before storing in cache.
- Logging — Logging helped us derive metrics like overall hit/miss rate, cache usage by bucket names, performance etc.
Tracking and Metrics
We have three sources through which we track metrics
- For server level metrics we use Nagios
- To track individual Redis Instance metrics like cache hit/miss ratio, memory usage, connections etc we have New Relic.
- Micro-service level metrics like cache hit/miss, number of operations, performance are derived through our ELK stack.
This pretty much sums up our journey. The current architecture is working quite well for us for some time now and we hope this will be helpful for anyone who has started facing scale issues with their caching layer.
Thanks Mohit Agrawal for guiding me throughout the journey.
We are hiring!!! , drop your resumes at firstname.lastname@example.org.