Bringing back Rails IdentityMap caching

Gabriel Kent
Aug 15, 2019 · 7 min read
Image for post
Image for post

Once upon a time, Rails had a feature under ActiveRecord called: IdentityMap.

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

When does Rails pro-actively cache ActiveRecord objects?

Answer: On associations when the inverse_of option is properly configured or inferred by convention.

Navigating to the associated record(s), then navigating back, should not incur an additional query.

If a similar record is freshly queried and happens to be associated with the same record, the same cache is not accessed and an extra query is made. Ideally, we should be able to realize that the associated record had already been queried and, therefore, an additional query is unnecessary.

Now an extra query or two may not seem like much, but if we consider the above example, there could be thousands of comments which are associated with any given blog. If those comments are loaded outside of the blog_post association, then attempting to reference the blog_post association from within each comment will result in N blog_posts table queries even if they all belong to the same BlogPost!

Why should you care?

Answer: high scale. At least, that’s why we care.

Scaling is exciting, but also painful. As customers begin to increase traffic, engineering teams may notice significant database load and contention. However, due to the size and complexity of most codebases, it can be extremely difficult to pinpoint specific inefficiencies around queries. We recently found ourselves in such a situation where the sheer volume of database reads began adversely affecting our services. These reads were largely due to the same associated records being re-queried across different source records.

Our Story

We were able to determine which tables were receiving the most queries, but that still didn’t quite help with pinpointing the exact code areas. We identified a few areas that were effectively making redundant queries, but we realized that targeted fixes were not very feasible nor scalable. We needed a broader, more effective solution to reduce redundant queries in order to reduce our overall database load.

IdentityMap could have been that solution for us, but it was deprecated after Rails v3.2.13 due to some noticed inconsistencies. According to our understanding of the inconsistencies, the feature was likely trying to support too many edge cases. All caching strategies have weaknesses and eventually break down if the usage is not properly scoped. Even so, there are rumors that this feature may be reintroduced in future Rails versions, but we didn’t want to wait that long.

Therefore, we set out to partially re-adapt the previous implementation of the IdentityMap feature to Rails 4.

What is appropriately scoped caching?

Caching is dangerous if not done correctly. For example, making decisions based on outdated data as if it was current. In our opinion, the following rule of thumb should be respected:

Don’t apply caching if the process is expected to react to changes during the caching period. i.e. Don’t cache when mixing reads and writes.

An example candidate for caching might be a nightly billing task which aggregates billing data for the past month. That kind of task is likely not expecting last minute updates while it runs. It assumes that the state of the world remains constant while processing.

What did we do and how did we do it?

Spoiler: we didn’t completely remake the IdentityMap feature set. We remade a valuable portion of it and also added in some new features of our own:

  • SingularAssociation#find_target cache read and writes
  • Persistence#instantiate cache writes only
  • Stats recorded for cache writes, misses, and hits
  • Dry run caching mode (no loading from cache; read attempts still increment)

SingularAssociation & Persistence Caching

Whenever a SingularAssociation record is accessed, we either load it from the cache or query it fresh and store it in the cache using the owner class + id or record class + id as the cache key. Separately, whenever a record is instantiated through Persistence, we also write it to the cache, but only using the record class + id since the owner is unknown at this point. In essence, we currently write to the cache in two locations, but read from it in only one location. There are other opportunities within ActiveRecord to also read from the cache, like within the definition of find, but we wanted to minimize the scope of change in order to reduce risk.

Enough stalling; here’s how we did it. We wrapped the find_target and instantiate methods in mixins like so:

We then prepended the mixins to the appropriate ActiveRecord classes during app initialization:

To further reduce risk, we went with a conservative block wrapper in order to contain the caching to a particular process or block of code. Consider the following (unoptimized) example where we can utilize our block wrapper for caching:

In this example, we are likely to come across multiple comments that are associated with the same blog_post and, thus, the same owner.

Without caching (or optimizing through other means), this procedure would normally result in exactly 2N+1 queries (where N = recent comments). One query for getting the list of recent comments and two queries for each comment in order to retrieve the associated blog_post and owner.

With caching, the maximum queries would be 2N+1 if we assume that every comment is associated with a unique blog_post AND owner. However, if every comment was associated with the same blog_post AND owner, there would only be 3 queries in total. The blog_post and owner would be cached and re-used for every comment after the first. Potentially reducing 2N+1 queries down to 3 is a massive performance increase and highly scalable!

Performance Evaluation through Cache Statistics

As valuable as the partial IdentityMap caching has been, we also saw great value in simply reviewing the cache statistics after implementation. We found that there were cases where the natural Rails association caches were being broken due to explicit and redundant where queries. These showed up as a large number of cache writes which were easily visible:

The stats were also helpful for simply proving to ourselves that the caching was working and having a meaningful impact.

The hit stats may not look particularly impressive as far as query saving, but the overall improvement is actually much larger because of nested associations. Cached records also have their own associations cached by Rails. If the record had been re-instantiated, then its associations would have been also. The Rails caching bypasses our IdentityMap feature so we don’t see those cache hits that would have previously been misses!

Consider the following case study:

Notice how the read attempts decrease when caching is fully in use. The attempts go down from 92 to 75. Without the dry run test, we may have incorrectly assumed that the net reduction in db reads was only 27 when it was actually 44 since we went from 92 to 48!

Future Plans

We currently have a few ideas for improving and extending the usage of our version of IdentityMap:

  • Move source code to gem
  • Intercept additional ActiveRecord instantiations for writes and reads
  • Log caller traces when write stats exceed a threshold in order to help pinpoint redundant queries
  • Publish metrics and define alerts around cache stats
  • Opportunistically extend caching to other appropriate areas of our platform!


Appropriate and intelligent caching can help identify and reduce redundant database queries.

Be safe and happy caching!

Invoca Engineering Blog

Invoca is a SaaS company helping marketers optimize for the…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store