The N+1 Query Problem and its impact on business.
TL;DR For a theoretical Ecommerce website making $3,000,000/yr with 145 categories of products, a single common N+1 Query issue could cost around $213,000 in revenue per year.
The issue at hand
The N+1 Query problem is a generalized problem that commonly presents itself in Web Frameworks like PHP’s Magento and Laravel, Java’s Spring, and Ruby’s Ruby on Rails. These frameworks use something like an Object Relational Mapping (ORM) to abstract away the complicated idea of a database from the minds of their application developers. This abstraction can occasionally give rise to subtle bugs which can wreak havoc if not fixed.
Here is a formulation of a simple task that can give rise to the problem:
I need to load some “objects” from a database. To do so, first I need to get a set of unique identifiers (ids) that identify which elements I need to load, and then I need to load all those objects from a database and do something with each of them.
With this task assigned to us, let’s first solve it the N+1 Query way (the wrong way):
<?phpuse MyModel;class ExampleClass {public static function sumMyData(){
$sum = 0;
$myIdsArray = [];
$myIdsArray = MyModel::getMyListOfIdsFromDb();
foreach($id in $myIdsArray){
$model = MyModel::loadFromDb($id);
$sum = sum + $model->value;
}
return sum;
}}
This code is readable and may seem like a satisfactory solution to the problem but, I guarantee this: if this code gets committed into your production environment, you’ve created a time-bomb that is waiting to go off.
The “time-bomb”” stems from the fact that in the “N+1” solution you’ve generated a non-constant number of queries to the database. For each ID in your array of IDs, you’ve generated “N” separate read requests to your database. Combine this with the single read sent out to get the ids in the first place, you’re now at “N+1” reads. If we wanted to talk Big-O we’re talking O(N) or “linear” operations where “N” is the number of elements we want to retrieve from our database.
Under the hood in SQL this code looks like…
SELECT ids_column FROM my_other_table;
SELECT * from my_objects WHERE id = 1;
SELECT * from my_objects WHERE id = 2;
SELECT * from my_objects WHERE id = 3;
SELECT * from my_objects WHERE id = 4;
...
...
...
...
SELECT * from my_objects WHERE id = N;Let’s refactor to a better way …
<?phpuse MyModel;class ExampleClass {
public static function sumMyData(){
$sum = 0;
$myIdsArray = [];
$myIdsArray = MyModel::getMyListOfIdsFromDb();
$collection = MyModel::getCollection($myIdsArray);
foreach($object in $collection){
$sum = sum + $object->value;
}
return sum;
}
}
Now, if we did a diff of these two, we’d only get 3 lines of differences, but in terms of performance, we’ve just made a massive improvement to our code.
Under the hood our SQL code now looks like…
SELECT ids_column FROM my_other_table;
SELECT * from my_objects WHERE id IN (1,2,3,4,..., N);Note: There are many articles saying that SELECT * … IN();is non-performant, and subqueries are better, but that is irrelevant to this demonstration, as we’re merely explaining the benefit of reducing total number of SQL queries.
We’ve managed to take our O(N) operations and reduce it into O(2) or “constant” operations. A major performance improvement with both immediate and long term impact.
In simpler terms, we’ve gone from running many small processes one after the other, to running one big process. Metaphorically, imagine you’re going to the grocery store to get the ingredients for baking a cake. In the “constant” case, you get a list of the groceries beforehand, and then do one trip to the store and come back fully prepared. Compared to the “N+1” case: each time you need an ingredient, you drive down the street to the grocery store and come back. Clearly, driving back and forth to the store only once is more efficient.
Now, I’ve been a bit handwavy in terms of specific implementation, but almost all modern frameworks support built-in ways of doing queries like:
SELECT * FROM table WHERE ID IN (1,2,3,4,...,N);but they may call it different things:
Analysis
Back to business! I’ve demonstrated the technical solution to the problem, but explaining the value of fixing issues like this to technical higher-ups, managers, and clients is always a challenge. To do this, let’s use the following hypothetical situation.
Assume we have an Ecommerce business that’s doing $3,000,000 dollars a year in sales. It’s been going well, but you want to increase revenue without adding new features to the platform. You don’t have the budget to build out that shiny new feature you’ve always dreamed of, but you still want to show that your team’s work is valuable and you need to directly correlate the team’s work to increased revenue.
Let’s assume you have 145 different categories on your Ecommerce site and they show up in some nifty navigation that appears on every page of your store. Everybody really loves clicking on it, and it’s definitely a great feature.
Your developers delve into that part of the codebase and notice that there’s DEFINITELY some form of the N+1 query problem going on. You instruct them to take some time to fix it, and you start to do some preliminary calculations on the potential impact of the fix.
Your developer approximates that each single read request to the database seems to take about 5ms.
You calculate that the old code has 146 operations: 1 for generating the ids of the categories, and 145 for getting each product category individually. So, in the old code, it takes 146 * 5ms = 0.73s to generate the categories on every single page of your site.
With the N+1 query problem fix, regardless of how many categories you have, it will now only take 2 * 5ms = 10ms to load them. A savings of 0.72s!
Having your developers find and fix problems like the N+1 query problem can have an immediate impact on sales and performance. If we’re to believe the often quoted:
Every 100ms increase in the page load time decreased sales by 1 percent
We’ve saved 0.71s across every single page of our site, which in theory should mean, assuming everything else is controlled, we should see a 7.1% increase in sales!
Translating that into our hypothetical example we expect that over the course of a year we should see approximately $3,000,000 * .071 = $213,000 growth in revenue just from that fix alone. This is a great way to increase sales efficiently.
Conclusion
The N+1 query problem is a general problem that could already exist in your codebase. It’s hidden in plain sight and is costing you money. Review your code and you will increase revenue and gain significant performance boosts in the long run.

