RedisMemo: Caching is made easy

Donald Dong
CZI Technology
Published in
7 min readJun 7, 2021

At the Chan Zuckerberg Initiative (CZI), we use RedisMemo to improve the performance and reliability of our products, Summit Learning and Along.

https://github.com/chanzuckerberg/redis-memo

RedisMemo is an application-level caching system for Ruby programs. It can be used to cache database queries (with built-in support for ActiveRecord), third-party API calls, pure functions, or any combination thereof.

app/models/user.rb

With RedisMemo, to cache all the select-by-id queries on the users table, only two lines of code are needed in the User model (details in section 3.2).

In the Rails console, the author method on the post object loads data from Redis instead of the database:

Rails Console Output

Section Outline

Section 1: Quick review of why caching is important

Section 2: Where existing Rails caching patterns fall short

Section 3: How caching is easily done with RedisMemo

Section 4: Promising results from adopting RedisMemo at CZI

1. Database caching

1.1 Performance

Caching database queries can improve application performance by

  • Skipping slow or expensive database queries
  • Skipping calculations that require multiple database round trips
  • Reducing the overhead from using a disk-based relational (SQL) database (Footnote 1)

1.2 Scalability

A caching layer functions like a “content delivery network” (CDN) for delivering database queries. Applications often use a CDN to multiply their web servers’ effective capacity. Similarly, caching increases an individual database’s capacity by protecting it from repetitive queries. A Rails application typically has a single SQL database that is queried by many application processes (Figure 1).

Figure 1. A typical Rails application architecture

When application usage increases, the number of requests grows, and so does the number of database queries. There is a hard limit on the size of a single relational database. The largest database instances offered by AWS come with 64 cores. A Redis cluster, however, can scale to 1000 nodes.

Cache data can be easily partitioned into multiple Redis clusters, which essentially makes the Redis cache layer infinitely scalable. Partitioning the SQL database is much more challenging and not always practical.

1.3 Cost Efficiency

Scaling the Redis cache layer is more cost-effective than scaling the database layer. If you use a db.m5.4xlarge database instance on AWS, you will pay $2,097 per month. If you add a 4-node Redis cache cluster which lets you move down to the db.m5.2xlarge instance, your total cost is $1,688, a savings of $409. The tradeoff only gets larger as your site grows.

2. Challenges with application-level caching

Here are three of the issues we encountered using the Rails low-level caching API and lessons we learned along the way (Footnote 2). We will demonstrate how RedisMemo addresses those problems in section 3.

2.1 Forgetting to invalidate the cache

Developers must manually invalidate the cached results after changes to associated database records (Snippet 1). Forgetting to do this is common, and causes subtle bugs.

Snippet 1. Implementation of CachedUser

2.2 Cache invalidation during database transactions

If the cache is invalidated immediately after a not-yet-committed write, the following cache read would fill the cache with data that could be rolled back later (Snippet 2).

Snippet 2. Mid-transaction invalidation

If the cache is invalidated after committing the transaction, the following cache read would use the stale data from the cache as if the changes did not occur (Snippet 3).

Snippet 3. Post-transaction invalidation

2.3 Cache invalidation could be slow and expensive

There could be a lot of cache results to invalidate depending on the content of the cache results. Consider the following example (Snippet 4). Each time a user updates their display name, we have to iterate through all their posts and invalidate the cache for each of them. Before finishing this process, users might see partially inconsistent data (some posts with the old display, some posts with the new display name).

Snippet 4. Slow cache invalidation for CachedPostAuthor

3. Caching is made easy

3.1 Performant and reliable cache invalidation

RedisMemo is a version-addressable caching system, similar to Git, a content-addressable storage system. Git computes a checksum of objects to retrieve those objects from its database. RedisMemo computes a checksum of dependency versions to retrieve cached method results (Figure 2). Version-addressability is the core of RedisMemo that brings performance and reliability.

Figure 2. Version Addressable

Each memoized method has one or more dependencies. Bumping the dependency version is an O(1) operation that takes about 2 milliseconds (atomicity and consistency assured with Redis Lua scripting). Millions or even billions of associated cached results could be invalidated immediately after updating a single dependency version. Using RedisMemo (Snippet 5), we can effectively resolve the issue described in section 2.3.

Snippet 5. Replace CachedPostAuthor with RedisMemo

3.2 Query analysis and auto-invalidation

If you track a column with RedisMemo, any SQL queries (ActiveRecord) with that column will be automatically cached, and automatically invalidated. This addresses the issue described in section 2.

Cached database queries in RedisMemo are memoized methods at the database adaptor level. RedisMemo analyzes the SQL query and tracks the dependencies of those queries automatically. Each dependency has a version that is automatically updated when database records are changed (by using some ActiveRecord callbacks) (Figure 3).

Figure 3. Query dependencies tracking
Snippet 6. Enabling user.id, user.first_name as RedisMemo dependencies

For example, after tracking some user table columns as dependencies (Snippet 6), queries such as

* redcord.user
* User.find(user_id)
* User.where(id: user_id).first
* User.where(id: user_id).first
* User.where(first_name: first_name).first
* User.find_by_first_name(first_name)

will first check the Redis cache for the data before hitting the SQL database; the cache results are invalidated automatically when user records are changed.

3.3 Add caching without changing any call sites

Instead of building separate cached code paths or providing new API such as Product.fetch, RedisMemo takes advantage of metaprogramming in Ruby and adds caching to existing code paths as annotation (Snippet 5).

Here are some of the motivations:

  • Caching should be a default behavior, rather than a conscious choice each developer has to make every time; they might not even be aware of the existence of the cached code path.
  • Cached and uncached code paths can diverge over time.
  • Switching to the cached ones may require changing a ton of files; it’s not always possible to change all the call sites, since some of them could be used in some gem code.
  • Separate code paths such as Product.fetch (Footnote 3) have other usability and compatibility issues with ActiveRecord such as relation lazy loading and association preloading. Learn more about those issues here.

3.4 Add caching confidently

Hard to make mistakes

Thanks to auto-invalidation, we can no longer forget to invalidate cached database queries (section 3.2), but we still need to specify the correct dependencies (Snippet 5).

With RedisMemo, we could pull in dependencies defined by other memoized methods to keep the code DRY and future minimize the opportunities for programming errors (Snippet 7). Learn more about hierarchical caching.

Snippet 7. Pull in existing dependencies with dependency_of

Monitoring

RedisMemo has built-in support for monitoring services such as Datadog. One could set up alerts on some monitoring metrics (such as overall cache-hit rate) to take action proactively.

Roll out changes safely

RedisMemo has built-in cache sampling logic. An error reporter will be invoked if some methods have incorrect dependencies that cause the cache results to be out of date.

We highly recommend sampling at least 1% of the cached methods in production. When rolling out a new cached code path, one could start with a 100% cache sample rate until they feel confident enough to reduce the sample rate.

Cached queries can be disabled per model by setting an ENV variable REDIS_MEMO_DISABLE_<table name>. RedisMemo can be turned off globally when REDIS_MEMO_DISABLE_ALL is set to true.

4. Our Experience

Easy adoption

Since we can add caching without changing the call sites (section 3.3), adopting RedisMemo is super easy. We were able to move 50% of the database traffic to our Redis cluster by adding a few lines of configuration code on some heavily used models.

Error resilient

When the Redis cluster is unavailable, RedisMemo would fallback to use the SQL database without affecting the application if the following options are configured properly:

  • Redis read/write/connect timeouts
  • Max number of connection errors per request before bypassing the cache layer

When there’re transient Redis request failures, RedisMemo has the following options to stay performant and reliable:

  • Retry on failed version bumping asynchronously
  • Bump versions in a number of background threads
  • Use Redis connection pooling

Good performance and usability

  • By reusing dependencies, we were able to cache an entire page fragment that sends dozens of database queries, changing the latency from ~150ms to ~10ms when there’s a cache hit.
  • We are also using RedisMemo to cache S3 object versions, changing the latency from ~200ms to ~2ms when there’s a cache hit.
  • No more silent cache inconsistency issues caused by programming errors. It becomes easier to debug cache-related issues with built-in logging, monitoring, global kill switches, and an inline kill switch RedisMemo.without_memoization { … } for troubleshooting in the Rails console.

With RedisMemo, caching becomes a low-hanging fruit in improving the performance and reliability of our applications.

Footnotes:

  1. Querying data from a key-value in-memory store is generally faster compared to querying from a disk-based SQL database since there’s no need to parse the SQL query, create an execution plan and execute it. However, the reduced overhead is often marginal compared to the time spent on the network round trip. Therefore, for the fast database queries, caching is not about improving the performance, but instead improving the scalability of the application.
  2. We’re aware of Shopify/identity_cache, a gem that provides query caching with automatic cache invalidation, which would resolve the issue described in section 2.1; however, it is affected by most of the other issues we want to address when caching queries at the application-level. You can learn more about the other challenges with using the Rails low-level caching API or other caching technologies such as IdentityCache here.
  3. IdentityCache is deliberately opt-in for all call sites that want to use caching. In comparison, RedisMemo is still deliberate in that clients should specify what computation and models should be cached. However, when caching does make sense, RedisMemo makes caching easy and robust by automatically using the cached code paths.

--

--