It all began when I was on site reliability — each week at carwow, a developer is assigned the task of triaging PagerDuty bug incidents — and I noticed we were experiencing some Postgres timeouts on our bug tracking tool. I wondered how best to avoid and/or fix these timeouts and upon researching how to improve these queries, I stumbled upon this article, “5 things I wish my grandfather told me about ActiveRecord and Postgres”, which mentioned using the
5 Things I wish my grandfather told me about ActiveRecord and Postgres
It’s been more than a year since we had a huge TV campaign, we had tables growing up to 30M rows. Since then we had a…
I found the article by Ken Fassone, who happens to be my tech lead, interesting but I must admit that even after reading it, I still didn’t fully understand how the
EXPLAIN function worked. It took a 1:1 with Ken and reading more articles on
EXPLAIN ANALYZE — some were helpful, for example, the Postgres documentation; others not so helpful — for me to feel confident enough to attempt to fix one of the timeouts.
In Fig 1 below, we are making an API request to return a list of phone calls. The endpoint caches a list of calls. Click here for more details on how to use caching to speed up the performance of a Rails application.
@calls.maximum(:updated_at).try(:utc) in Fig 1 used the SQL query plan, Fig 2, below to obtain the most recently modified call.
EXPLAIN ANALYZE SELECT MAX("calls"."updated_at") FROM "calls" LEFT OUTER JOIN "users" ON "users"."id" = "calls"."user_id" LEFT OUTER JOIN "dealership_summaries" ON "dealership_summaries"."id" = "calls"."dealership_id" LIMIT 25 OFFSET 0;
Limit (cost=39035.05..39035.06 rows=1 width=8) (actual time=2252.002..2252.002 rows=1 loops=1)
-> Finalize Aggregate (cost=39035.05..39035.06 rows=1 width=8) (actual time=2252.001..2252.001 rows=1 loops=1)
-> Gather (cost=39034.85..39035.05 rows=2 width=8) (actual time=2251.807..2263.931 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=38034.85..38034.85 rows=1 width=8) (actual time=2247.741..2247.741 rows=1 loops=3)
-> Parallel Seq Scan on calls (cost=0.00..37792.87 rows=483957 width=8) (actual time=0.915..2205.638 rows=387695 loops=3)
Planning Time: 1.025 ms
Execution Time: 2264.023 ms
(9 rows)Fig 2
From the query plan in Fig 2, we can see the database spent a lot of time — and the most time — performing a sequential scan to obtain 387,695 rows: it took 0.915 ms to fetch the first row and 2205.638 ms to fetch the last row. Checking the schema of the calls table revealed that there was no index on the
updated_at which was, of course, my initial suggested solution. After speaking with Ken and taking into consideration the importance of the calls table to the daily business operations and the cost of maintaining the existing 7 indexes, we decided not to go down the route of adding a new index on the
Further inspection of the query plan showed that the database performed the following operations in sequence: scanned all the rows in the calls table, calculated the maximum
updated_at from the resulting rows and then returned the value: very weird! What we want is the most recently updated call in
@calls, whose size should not exceed 25, hence it surely doesn’t require 387,695 sequential scans. The solution obviously lay in rewriting the code.
last_modified code to what can be seen in Fig 3, resulted in the database query and query plan shown in Fig 4 below.
Call.from(@calls, :calls).maximum(:updated_at).try(:utc)Fig 3EXPLAIN ANALYZE SELECT MAX("calls"."updated_at") FROM (SELECT "calls".* FROM "calls" ORDER BY calls.created_at desc LIMIT 25 OFFSET 0) calls;QUERY PLAN--------------------------------------------------Aggregate (cost=1.35..1.35 rows=1 width=8) (actual time=0.137..0.137 rows=1 loops=1)-> Limit (cost=0.09..1.26 rows=25 width=2721) (actual time=0.037..0.128 rows=25 loops=1)-> Index Scan Backward using index_calls_on_created_at on calls (cost=0.09..54712.03 rows=1161879 width=2721) (actual time=0.036..0.125 rows=25 loops=1)Planning Time: 0.172 msExecution Time: 0.169 ms(5 rows)Fig 4
This shows a massive improvement in the time it took to execute the query and the database is doing exactly what we expect. Here, the database uses the existing index on
index_calls_on_created_at, to perform the subquery, which is equivalent to
@calls, to return 25 rows from which it then finds the most recently updated call.
Optimising database queries is a useful skill for developers to have and it leads to great user experience. Using
EXPLAIN ANALYZE reveals how the database will execute a database query and can provide clues on how to improve a slow running query. Note that
EXPLAIN ANALYZE will actually execute the query, even if the query is an insert, update or delete, so be careful!
In my next blog, I will reveal how we solved another Postgres timeout by adding a partial index and deleting unused indexes. Thanks for reading, and if you have any questions or similar experiences, please share them in the comments :)