Scaling Redis at 7shifts

Cory Jacobsen
7shifts Back of House
7 min readNov 24, 2023

Caching is one of those technologies that eventually all applications reach for. The barrier to entry is very low as almost every framework has hooks for caching services. Spin up a Redis or Memcached instance, and with a simple configuration tweak you have caching. But what happens when you outgrow a single caching instance? This is the high-level story of how we scaled our Redis caching platform at 7shifts.

Starting slow

When 7shifts was first starting out, we had no application caching. Everything including session data was backed by the database. This allowed us to keep our system architecture very simple. To scale with demand we would increase the resources of the database. But as you can imagine, scaling the database is only practical for so long.

System diagram — our simple starting point

We reached a point where it made sense to start caching a few models to reduce some database pressure. We had the classic setup where a model would write to the database, but would now also be cached within Redis. The models with expensive lookups were the first candidates to be cached. This improved our application performance while also reducing load on the database.

System diagram — adding a single cache instance

Too many connections

Things were looking good, but we soon ran into a different type of problem. We had configured our monolithic PHP application to connect directly to Redis. This means every request opens and closes its own connection to Redis. At a small scale, these connections are not expensive enough to worry about. As the application scales up the constant opening and closing of connections can become expensive. We started to see a correlation between our increasing latency and the number of Redis connections.

Some quick searching on the problem led us down the path of proxies. Adding a proxy or connection pool between the application and Redis made a lot of sense. A connection pool is a way to reduce the cost of opening and closing connections. The pool can maintain a set of open connections that can be reused by the application. We ended up implementing Twemproxy which solved our connection problems.

Connection reduction with Twemproxy
System diagram — adding a proxy in front of Redis

Out growing a single instance

Fast forward to the summer of 2023, we were growing fast and adding a lot of new customers to our platform. Our database reached a point where we would be utilizing over 90% of the available CPU. Upgrading wasn’t ideal as we were one upgrade away from the largest tier offered by our provider. We would rather save that final upgrade as a last resort for future scaling problems. We looked at splitting a few services into their own databases. But doing so would add a lot of complexity as well as cost. The more economical solution was to simply cache more data.

We rolled out a change that cached one of our most frequently used data models. We saw an immediate benefit as the database CPU started to drop. But surprisingly our overall latency started to climb! Needless to say, this wasn’t what we expected to happen and had to rollback the change.

Hitting Redis limits

Redis is a fantastic service to work with, very easy to operate and reason about. The monitoring is straightforward as well. The biggest metric to watch is memory consumption. If we run out of memory, Redis can evict keys based on your preferences (usually LRU or LFU). We had been slowly increasing our Redis instance size as needed. Looking at the metrics we knew we still had plenty of available memory.

Flipping over to the CPU graph, we started to see the real problem. Redis CPU had climbed above 100% utilization. With our added load, we were asking too much of Redis which was forcing requests to wait for CPU time. Redis is single threaded, so there is no easy scaling solution at this point. What do we do now?

Scaling options

While investigating the problem, we came across a few different options. Redis Cluster was the first solution that looked promising. This is a distributed implementation of Redis that automatically shards keys across multiple instances. Unfortunately it wasn’t an easy change as we were relying on MemoryStore from Google Cloud Platform (GCP). At the time GCP did not support Redis Cluster, though they do now.

The next solution we investigated was utilizing read replicas which GCP does provide. After looking at what the implementation would look like, we had some reservations. We didn’t want the extra connection logic within the application. We wanted to keep the developer experience and application configuration as simple as we could.

Sharding was next on the investigation list. At first we looked at creating a second Redis instance and letting the application handle the logic. But like the read replica solution, this adds a fair bit of complexity to the application. However we found some interesting articles on Envoy and their client-side sharding strategy.

Instead of having the sharding logic within the application, we could move it to the proxy layer. Developers would not need to know any implantation details and the application configuration would stay simple. This also gives the infrastructure team full control over the scaling architecture. After some testing and benchmarking, we decided this was the way forward for us.

Rolling out Envoy

With a solution in hand, it was time to implement the new setup. We broke down the migration into a few steps. First we wanted to replace Twemproxy with Envoy. This change was simple as Twemproxy was only handling the connection pool. Swapping our proxy setup to Envoy was quick, easy, and a bit shocking. We saw an instant drop in Redis CPU usage and application latency!

CPU drop after moving over to Envoy

We assumed something was misconfigured, but no alerts fired off. Our customer support team wasn’t seeing any negative customer impact either. We had a hunch this was due to the Redis protocol version, but didn’t spend a lot of time looking into it. Free performance is always welcome.

System diagram —replacing Twemproxy with Envoy

Sharding Redis

At this point, we were ready to spin up more Redis instances and configure Envoy. Google has a great blog post on how to configure Envoy with client-side sharding. While setting up the sharding instances, we created 3 smaller Redis instances to replace our single large instance. We configured the new instances as a separate cluster within the Envoy configuration. We were then able to use Envoy’s request mirror policy to double write the cache. Based on our application’s cache TTL setting, we knew this mirroring process would take 24 hours to complete.

System diagram — final Envoy implementation with multiple shards

After switching the proxy configuration to use the new sharded cluster, we assumed our work would be complete. But it’s never that easy! We saw some interesting CPU metrics coming from the 3 new instances. We verified the key count per instance to ensure the distribution was even. So why was CPU usage so different between the instances?

CPU utilization of the sharded Redis instances

Hot keys

After some digging around, we found a Redis CLI flag named --hotkeys. This runs over all the items within Redis and reports back with the keys most frequently used. It turns out we have a good number of keys that are required on almost every page load. This was a bit of bad luck as a handful of those keys ended up on the same Redis instance leading to inconsistent CPU usage.

To solve this problem we came up with the concept of “high frequency” keys within our application. We wrote some code in the application that allows us to set certain keys as “high frequency”. Behind the scenes this simply adds a HIGH_FREQ_ prefix to the key.

When a prefixed key comes into Envoy, it uses the prefix route configuration. If we are writing a high frequency key, it will save to all 3 instances. This may seem counter intuitive, but we are doing this to spread the load across all the sharded instances. If we then read a high frequency key, we use a round robin lookup which distributes the load. After we tracked down all the offending high frequency keys, our CPU usage looked much better.

CPU utilization of the instances after implementing the high frequency key concept

Conclusion

Our scaling journey through the world of caching has been both challenging and enlightening. We’ve had our share of setbacks and ‘aha’ moments. But each challenge has provided us with valuable insights. As of now, our caching strategy is robust, scalable, and — most importantly — cost-effective. We hope that by sharing our journey, we can help you navigate the complexities of scaling cache in your own applications.

--

--