N+1 Queries and How to Avoid Them!

A Tutorial in Active Record

What is a N+1 Query

Simply stated, a n+1 query is an inefficient way query a database, and the best way to demonstrate such an inefficient query is through an example.

Take a look at the following two models:

This is a simple has many/belongs to relationship in Rails. Firing up the Rails console we can see that this relationship is functioning properly by querying the first article’s associated comments:

This is nice, but it only returns us the comments for the first article. What if instead, we wanted to get all the comments for five articles? At first glance we might tempted to write code like this:

This would work and for only pulling comments for five articles, the inefficiencies of this code would probably be ok… But say you were pulling thousands of articles. The above process would be extremely inefficient because of the n+1 query.

What is a n+1 query?

Take a look at the SQL generated for the above code:

Refreshing the article show page, sends 6 queries to the database. The first query (n) comes from the controller and finds the first 5 articles. The subsequent 5 queries (n + 1), find each article’s comments. My example database has roughly 70,000 articles, and using this code the page took 408 ms to load — that is terrible!

Even worse — what happens if we pull up to 30 articles? Below is a graph plotting the page load time in increments of 5 articles.

Here the page load time for 30 articles is over 2 seconds!

This graph shows why n+1 queries are bad for performance — with each additional article the page load time increases linearly.

Active Record Includes Method

Luckily there is a much better way to query. The answer is the Active Record Includes method. Below I updated the Articles controller to use the Includes method. I left the code in the view the same.

Now when I refresh the view, instead of the n+1 query, I see the following generated SQL:

There are 2 big things to notice here:

  1. The page load time was much faster : 161 ms vs. 408 ms respectively.
  2. The number of database queries was 2 compared to the 6 for the n+1 query.

Here is the graphical comparison of the page load speed. I left the y-axis scale the same for comparative purposes:

Even with additional articles, the load time stayed relatively the same with only a fractional increase in page load time.

How does the Includes method work?

The Includes method uses a concept called ‘Eager Loading.’ In our example, eager loading works by preloading every comment for every article beforehand in a temporary cache stored in memory. This allows us to iterate through all the articles and call ‘.comments’ on them without having to ping the database over and over again.

The process behind this temporary cache is buried deep in the inner workings of Rails’ black magic. But from a very high level it works like this.

When you setup a has_many/belongs to relationship, Rails does a lot of work behind the scenes. But for our purposes, you need to know that setting up an Active Record association like the one we did above allows your class to inherit methods from the ActiveRecord::Associations module. One of these methods is the association method that takes as an argument a name (in our case — :comments).

When an Active Record associated is called (i.e. Article.first.comments), the association method will first try to pull the requested association data out of the @association_cache instance variable assuming it has already been loaded into memory. ** The @association_cache is initialized as an empty hash**

If the association data is not in the @association_cache (i.e. you did not call the ‘Includes’ method), then, very simply speaking, the association will have to be loaded.

To prove this, let’s look into the @assocation_cache when the Include method is used:

AMAZING… Rails pre-populated the hash with the respective Article’s comments.

And if you do not use the Includes method… you get back nothing.

The beauty of the ‘Includes’ method is that it preloads the @assocation_cache instance variable, so when Active Record goes to fetch the data, it is already there and doesn’t have to ping the database multiple times.

In summary Active Record’s Include method is wonderful — it preloads data for you, making data retrieval in Active Record associations much faster and much more efficient!

** Thanks Horace for dropping the knowledge making this article possible

Founder of Too Many Miles

Founder of Too Many Miles