Why is my database slow? When 10 queries are faster than 1
At Ufonia we recently resolved a database performance issue with a solution that seems counter-intuitive at first, but is an essential technique for complex real-world data models. Here is how we did it.
Building a to-do app or Twitter clone is a rite of passage for many software engineers learning about UIs, APIs and data models. You start by building a RESTful API that looks something like this:
GET /tweets
GET /tweets/id
POST /tweets
GET /users
GET /users/id
and then build a feed by loading the most recent 10 tweets from the API. You then decide you want to show the author of the tweet, and so for each tweet, you make a request to get the author. Here you encounter the N+1 problem, as for each element in the array of tweets, you have to make a separate API call to get the author:
GET /tweets?orderBy=createdAt,DESC&limit=10
[
{
"id": 1,
"contents": "This is my first tweet!",
"authorId": 4
},
{
"id": 2,
"contents":
"Hello world!",
"authorId": 7
}
...
]GET /users/4
{
"id": 4,
"firstName": "Mahatma",
"lastName": "Gandhi"
}GET /users/7
{
"id": 7,
"firstName": "Florence",
"lastName": "Nightingale"
}etc.
You then come across the idea of reducing the “purity” of the API by embedding relations into the original representation of a tweet because you will almost always need that relation:
GET /tweets?orderBy=createdAt,DESC&limit=10
[
{
"id": 1,
"contents": "This is my first tweet!",
"author": {
"id": 4,
"firstName": "Mahatma",
"lastName": "Gandhi"
}
},
{
"id": 2,
"contents": "Hello world!",
"author": {
"id": 7,
"firstName": "Florence",
"lastName": "Nightingale"
}
}
...
]
Now you only need to make one API call, which reduces the traffic between the frontend and backend, and between the backend and the database. Instead of making 11 database queries (one for the 10 tweets, then 1 for each author), the backend can use a LEFT JOIN
to make just one query, improving the performance of your application:
This returns 10 rows from the database with the author in line. If each tweet had 3 hashtags, you could include another LEFT JOIN
on the hashtags table, giving 30 rows.
Any software engineer reading up to this point should be nodding along, recognising how the concepts are being introduced. The implication is that there is a straight line path from here to applications you will build at work. But real-world applications are more complex than a to-do app or Twitter clone and have far more complex data models, and that difference really matters when it comes to database performance. You cannot simply keep applying the techniques you learnt. At some point you will run into a problem like we did…
Our automated clinical assistant, Dora, calls all the patients on a hospital pathway at a set time/day during the week. We create a CallList
, which is a list of the calls we have scheduled for that pathway this week. Each ScheduledCall
has several phone numbers we can try to reach the patient on, and when a CallList
runs, we create a Call
each time we phone the patient and record it against the ScheduledCall
. During a call with the patient we understand various pieces of information, such as whether they have pain in their knee. These are Symptoms
from the call. So we have the entities:
Now what if, when looking at a CallList
, you wanted to know at a glance how well it had gone by looking at a high-level green/amber/red status? How well we understood the patient is our core metric, and so the status is based on the percentage of Symptoms
we successfully gathered:
≥ 95% = green
between 95% and 75% = amber
< 75% = red
To work this out, you need to load the CallList
, then all the related ScheduledCalls
, then all the related Calls
, then all the related Symptoms
, check how many have Symptom.value !== null
, and then map to a RAG status. If we called 100 patients, we had to try calling each patient an average of 2 times to get them on the phone, and for this pathway we need to collect 15 Symptoms
, we would need to load 100*2*15 = 3,000 pieces of information to work out the status.
When building software applications, engineers know that the network is slow and unreliable in comparison to in-memory operations, and so when faced with needing to load 3,000 pieces of information, you are going to look for solutions that don’t involve making 3,000 database queries. Given the mental model of A has many Bs
being so clear in the domain model, a LEFT JOIN
seems to be a sensible choice (and is what an ORM like TypeORM would encourage you to do, using relations). So the best way to load the data seems to be to use the following query:
Although it seems quite a long query, the software engineer will be happy that she is only querying the database once (not 3,000 times!), and therefore should see excellent performance. The problem here is that with more than a couple of JOINs
(and depending on the amount of data), the database will become very slow, with each extra JOIN
multiplying the query response time.
The solution to this problem is (counter-intuitively) to make multiple database queries. The idea is that you can combine the ease of traversing properties in code with the speed of a primary key lookup in the database. The difference in computational effort between the Typescript:
callList.scheduledCalls;
and the SQL:
LEFT JOIN ScheduledCall ON ScheduledCall.id = CallList.id
means that it makes sense to help the database traverse the entity graph by executing a new query per entity type:
This makes 4 SQL queries:
Each of these is very simple and cheap to execute as they are just querying the primary key of each table.
The final part of the puzzle is how to join the entity graph back together in memory, as we have 4 query results that don’t know about each other.
Here we have the same result as if we’d made the LEFT JOINs
, but for the price of some pretty simple array manipulation we have much better database performance. (This will also have a significantly smaller RAM footprint; in fact, an OutOfMemory exception was what showed us there was a problem with the LEFT JOIN
query).
The takeaways:
- Real-world data models are richer and more complex than a to-do app or Twitter clone
- JOINs only improve performance up to a point (and from that point make things much worse)
- Software runtimes and database engines are good at different things, and you should use this to your advantage
- At Ufonia we solve interesting technical problems (we’re hiring!) 😉
Here is the full method:
Note 1: this example is somewhat simplified, and so the reader might think that an aggregation function in the SQL query would be the best option. That would be true here, but the status of a CallList
depends on far more than just the Symptoms
, and so the whole entities of the CallList
, ScheduledCalls
, Calls
, Symptoms
, and several more tables are required in memory.
Note 2: TypeORM recently introduced a relationLoadStrategy
option in version 0.3.0 which takes a similar approach internally. This isn’t well signposted, and has significant issues, so you may be able to use the built-in option, but understanding the fundamentals will give you an escape hatch in non-bread-and-butter scenarios.