Doctolib

Founded in 2013, Doctolib is the fastest growing e-health service in Europe. We provide healthcare professionals with services to improve the efficiency of their organization, transform their patients’ experience, and strengthen cooperation with other practitioners.

How We Optimized a Query to Run 1 Billion Times a Day at Doctolib

--

Over the past few months, our team at Doctolib has been working on migrating authorization data to a centralized table within its own database. This authorization database is powered by PostgreSQL. As part of the migration, we needed to transform the schema, which necessitated rewriting the queries. Given that authorization data is frequently queried, it’s crucial that these new queries are both fast and efficient.

Migration of Authorization Data to a Centralized Table/Database

In this article, we won’t be discussing the migration process itself; instead, we’ll focus on how we enhanced the performance of queries in the new table.

The following code retrieves the access grants for a single account:

Grant.where(
role_id: [
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
'06fa4f0b-24de-4f3d-8803-a917606a0fb3',
'17b47072-8218-4b8c-afc9-f9fcddf1bffe',
'62bae27a-4b7f-4999-ac3b-de73e75e8273',
'74d99f9a-a04e-408e-a8c7-b1f7c9da4280',
],
account_id: ['123456']
)

The code above generates the following SQL query to retrieve records from the grants table.

SELECT *
FROM "grants"
WHERE "role_id" IN (
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
'06fa4f0b-24de-4f3d-8803-a917606a0fb3',
'17b47072-8218-4b8c-afc9-f9fcddf1bffe',
'62bae27a-4b7f-4999-ac3b-de73e75e8273',
'74d99f9a-a04e-408e-a8c7-b1f7c9da4280'
) AND "account_id" IN ('123456')
-- AND "expires_at" IS NULL OR "expires_at" > NOW();

Note: The actual query also includes a condition for the expires_at column, which we have omitted from our analysis for the sake of simplicity.

EXPLAIN the query

We begin by using the EXPLAIN command. EXPLAIN is a PostgreSQL command that displays the execution plan selected by the query planner for a given SQL statement, without executing it.

EXPLAIN
SELECT * FROM ...

The command above provides the following result:

[
{
"Plan": {
"Node Type": "Index Scan",
...
"Index Name": "index_grants_on_account_id",
"Relation Name": "grants",
"Alias": "grants",
"Index Cond": "((account_id)::text = ANY ('{123456}'::text[]))",
"Filter": "(role_id = ANY ('{96761ffd-ccb9-46c8-abb5-5810d0d202bb,...}'::uuid[]))"
}
}
]

As we can see, PostgreSQL uses the index_grants_on_account_id index to locate records with the specified account ID, ensuring that the database uses an index scan rather than a sequential scan.

Reduce the Data Transferred over the Network

Fetching records from the database involves more than just executing the query itself. Each query execution also includes waiting for a connection, transmitting the SQL command over the network, receiving records over the network, and more.
Therefore, retrieving large volumes of data from the database can be slow. By reducing the response size, we can enhance query performance.

To achieve this, we need to benchmark the query on the application side. The code below measures the execution time of our query.

time = Benchmark.measure do
Grant.where(
role_id: [
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
...
],
account_id: ['123456']
).load
end

puts "Time elapsed: #{time.real} seconds"
# => Time elapsed: ~0.2 seconds

To reduce the size of the retrieved data, we can fetch only the columns that are necessary. To do this, we specify the required columns in the query.

time = Benchmark.measure do
Grant.where(
role_id: [
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
...
],
account_id: ['123456']
)
.pluck(:account_id, :resource_id, :resource_type, :role_id)
.load
end

puts "Time elapsed: #{time.real} seconds"
# => Time elapsed: ~0.1 seconds

Note: If you’re using an ORM, you can save time by avoiding the initialization of model objects. For instance, in ActiveRecord, using pluck instead of select can prevent unnecessary model object initializations.

More Advanced Aspect of Query Plan

Now, let’s dive into the query plan and explore more advanced features of the EXPLAIN command. We can enhance EXPLAIN to obtain a more detailed query plan.

set track_io_timing = on;
EXPLAIN(ANALYZE, TIMING, BUFFERS)

SELECT "account_id", "resource_id", "resource_type", "role_id"
FROM "grants"
WHERE "role_id" IN (
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
'06fa4f0b-24de-4f3d-8803-a917606a0fb3',
'17b47072-8218-4b8c-afc9-f9fcddf1bffe',
'62bae27a-4b7f-4999-ac3b-de73e75e8273',
'74d99f9a-a04e-408e-a8c7-b1f7c9da4280'
) AND "account_id" IN ('123456')

Let’s take a quick look at the settings we’ve added to the EXPLAIN command:

  • ANALYZE Actually executes the query instead of just showing the plan, which shows real execution time and row counts (Total Cost, Actual Total Time, Actual Rows).
  • TIMING Shows the time spent in each node of the query plan by measuring CPU time for operations.
  • BUFFERS Shows buffer usage statistics by displaying Shared, Local, and Temp buffer Hits and Reads (Shared Hit Blocks, Shared Read Blocks).
  • TIMING Seeing execution time for each block could give a good sense to the execution speed for the query [List of the fields that is being added].
  • track_io_timing = on Enables tracking of I/O timing statistics by measuring actual time spent on disk I/O operations. It helps us to understand how much time we spent to read non-cached records (I/O Read Time)
[
{
"Plan": {
"Node Type": "Index Scan",
...
"Index Name": "index_grants_on_account_id",
"Relation Name": "grants",
"Startup Cost": 0.43,
"Total Cost": 696.92,
"Actual Startup Time": 2.824,
"Actual Total Time": 95.762,
"Actual Rows": 2882,
"Actual Loops": 1,
"Index Cond": "((account_id)::text = '123456'::text)",
"Rows Removed by Index Recheck": 0,
"Filter": "(role_id = ANY ('{96761ffd-ccb9-46c8-abb5-5810d0d202bb,...}'::uuid[]))",
"Rows Removed by Filter": 1,
"Shared Hit Blocks": 1,
"Shared Read Blocks": 765,
"I/O Read Time": 93.775,
"I/O Write Time": 0.000,
...
},
"Planning": {
...
},
"Planning Time": 0.438,
"Triggers": [
],
"Execution Time": 96.019
}
]

Note: The attributes not discussed in this article have been removed from the query plan above.

As we can easily recognize, most of the time is spent on I/O Read Time. When the required blocks are not found in the cache, PostgreSQL fetches them from the disk. In other words, Shared Read Blocks and I/O Read Time are directly correlated. Each Shared Read Block typically requires an I/O operation, and more Shared Read Blocks generally result in more I/O Read Time.

How can we reduce Shared Read Blocks?
Shared Hit Blocks refers to the number of blocks read from the cache. The total number of required blocks for our query is 766 (Shared Hit Blocks + Shared Read Blocks). It's important to note that PostgreSQL's "Shared Buffer Cache" operates based on blocks (defaulting to 8KB pages). Therefore, by increasing the cache hit ratio, the total number of read blocks remains the same and if Shared Hit Blocks is increased, Shared Read Blocks will be reduced.

If we execute the EXPLAIN command a second time, Shared Read Blocks and I/O Read Time typically decrease, leading to a significantly reduced Execution Time (which explains why the query runs faster on subsequent executions). When Shared Hit Blocks is much greater than Shared Read Blocks, we describe this situation as having a high cache hit ratio.

If Shared Read Blocks changes with each query execution, we need a method to determine the cache hit ratio for our query in production. To assess the cache hit ratio, we used the Datadog | Database Monitoring tool. In the "Query Metrics" section, we searched for our query and examined the Shared Blocks Hit Ratio column in the table.

Cache Hit Ratio for Queries in Production

After examining the cache hit ratio, we realized it was very low for our new query in the new table. We used two strategies to improve the ratio for our query.

Warming-up the Cache
Warming up the cache involves executing the query more frequently in your database. Since the old query was heavily used in the application, immediately switching to the new query in the new table and database could cause issues due to the cache not being warmed up yet. Fortunately, we were using experimentation, which allowed us to warm up the cache by slowly increasing the experimentation percentage.

Improve Cache Hit Ratio
As mentioned earlier, in our example, the query needs to fetch 766 blocks. We understand that the cache size is limited and cannot hold the entire database in memory. By reducing the number of required blocks, we can improve the cache hit ratio, as fewer blocks need to be cached.

Let’s try retrieving fewer columns from the database by removing resource_type and executing the query again:

set track_io_timing = on;
EXPLAIN(ANALYZE, TIMING, BUFFERS)

SELECT "account_id", "resource_id", "role_id"
FROM "grants" ...

If we execute the EXPLAIN command, we will see that the total number of blocks remains exactly 766, just as before?! 🤔

Why? This makes sense when we consider how records are stored and cached. Each record, along with its attributes (though this can differ for wide records), is stored in blocks or pages. When PostgreSQL needs to read data from a block, it reads the entire block, not just the specific data or columns requested.

What is “Index Only Scan”?
When PostgreSQL can retrieve all required data directly from the index without accessing the actual table, it uses an “Index Only Scan” instead of an Index Scan. This is the most efficient type of scan when applicable. Therefore, we need to define an index that includes all required data. “All required data” means both the selected columns and the columns that are part of the conditions. Before proceeding to define the index, let’s take a look at the complete query, including the expires_at condition that we previously omitted for simplicity.

SELECT "account_id", "resource_id", "resource_type", "role_id"
FROM "grants"
WHERE "role_id" IN (
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
'06fa4f0b-24de-4f3d-8803-a917606a0fb3',
'17b47072-8218-4b8c-afc9-f9fcddf1bffe',
'62bae27a-4b7f-4999-ac3b-de73e75e8273',
'74d99f9a-a04e-408e-a8c7-b1f7c9da4280'
) AND "account_id" IN ('123456')
AND expires_at IS NULL OR expires_at > NOW()

According to the query, “all required data” refers to five columns: account_id, resource_id, resource_type, role_id, and expires_at. An index with five columns could result in a large index. Let’s test this locally. I inserted approximately 11 million records into my grants table, which has 10 columns. I checked the table size, and it's 1.27 GB. Now, I want to try the following index with these five columns:

CREATE INDEX index_grants_on_five_columns
ON grants
USING btree (account_id, expires_at, role_id, resource_id, resource_type);

The new index’s size is approximately 700 MB. While indices provide benefits, they also come with costs — the larger the index, the higher the cost. If this index were the only one for the table, it might be worthwhile to keep it, but that’s usually not the case.
We were able to remove two columns from the query; let’s see how!

Remove resource_type column
In our application, we know that resource_type can be derived from role_id, meaning we don't need to retrieve it from the database; instead, it can be calculated and appended to the result in memory. This is possible because each role_id corresponds to a single resource_type.

Remove expires_at from the condition
The expires_at column specifies the expiration time for a record when set. We also had a nightly background job to clean up records that expired the previous day. Our application logic can tolerate returning a recently expired record, so this condition doesn't need to be very strict. Therefore, we decided to remove expires_at from the query and run the cleanup background job more frequently. Since it's a very lightweight job, it can even run every minute.

After reducing the number of required columns to three, we defined the new B-tree index on account_id, role_id, and resource_id:

CREATE INDEX index_grants_on_account_role_resource_id 
ON grants
USING btree (account_id, role_id, resource_id);

Note: Since the query condition is based on role_id and account_id, these columns should come first in the index; otherwise, PostgreSQL won't be able to use the index effectively. The order of columns is important when creating a B-tree index.

Update: At the time of implementation, our team was not aware of PostgreSQL’s INCLUDE parameter for indexes. In our example, since resource_id is not used in the query condition, it can be included in the index as a non-key column.

CREATE INDEX index_grants_on_account_role_resource_id 
ON grants
USING btree (account_id, role_id)
INCLUDE (resource_id);

Using INCLUDE can reduce the index size. I tested it locally with approximately 11 million records, and the index size decreased from 700MB to 500MB.

Here is what the new query looks like:

SELECT "account_id", "resource_id", "role_id"
FROM "grants"
WHERE "role_id" IN (
'96761ffd-ccb9-46c8-abb5-5810d0d202bb',
...
) AND "account_id" IN ('123456')

And let’s run EXPLAIN on our new query:

[
{
"Plan": {
"Node Type": "Index Only Scan",
...
"Index Name": "index_grants_on_account_role_resource_id",
"Relation Name": "grants",
"Total Cost": 13.14,
"Plan Rows": 183,
"Plan Width": 31,
"Actual Total Time": 0.875,
"Actual Rows": 2882,
"Actual Loops": 1,
"Index Cond": "(account_id = '123456'::text)",
"Rows Removed by Index Recheck": 0,
"Filter": "(role_id = ANY ('{96761ffd-ccb9-46c8-abb5-5810d0d202bb,...}'::uuid[]))",
"Rows Removed by Filter": 1,
"Shared Hit Blocks": 183,
"Shared Read Blocks": 0,
"I/O Read Time": 0.000,
...
},
"Planning": {
...
},
"Planning Time": 0.200,
"Triggers": [
],
"Execution Time": 1.109
}
]

With the new index, only 183 blocks were read, and the "Node Type" changed to "Index Only Scan" 🎉. Additionally, the Execution Time has significantly decreased for this specific query. Before celebrating our achievement, let's check the average (or even better, the p99) execution time for this query in production.

After using the new query and index for a while in production, we checked Datadog monitoring.

Query metrics in Datadog for the new query and index

The Shared Blocks Hit Ratio is 100% 👌 and the Average Duration is 44 μs 🤯. It's time to celebrate!

Conclusion

In this article, we explored how to analyze a query to understand its performance and cost, and we learned several techniques to enhance them.

Is it worth optimizing a query to this extent, or is it overkill?
It depends(as usual)! Are you optimizing the query to improve the response time for the end-user, or do you want to reduce the load on your valuable database? Usually, the end-user cannot perceive a difference of a few milliseconds in response time, so saving milliseconds doesn’t necessarily enhance user experience. However, if the query is frequently executed, saving milliseconds becomes very valuable for the database. In some cases, you might even make your query faster at the database level but slower at the application level because application-level scaling is typically easier to achieve horizontally than database scaling. Scaling a database is more complex, expensive, and comes with certain limitations.

--

--

Doctolib
Doctolib

Published in Doctolib

Founded in 2013, Doctolib is the fastest growing e-health service in Europe. We provide healthcare professionals with services to improve the efficiency of their organization, transform their patients’ experience, and strengthen cooperation with other practitioners.

Ali Sepehri
Ali Sepehri

No responses yet