Postgres Optimisation Tips part I: EXPLAIN & ANALYZE

Photo by Tobias Adam on Unsplash

It all began when I was on site reliability — each week at carwow, a developer is assigned the task of triaging PagerDuty bug incidents — and I noticed we were experiencing some Postgres timeouts on our bug tracking tool. I wondered how best to avoid and/or fix these timeouts and upon researching how to improve these queries, I stumbled upon this article, “5 things I wish my grandfather told me about ActiveRecord and Postgres”, which mentioned using the EXPLAIN function.

I found the article by Ken Fassone, who happens to be my tech lead, interesting but I must admit that even after reading it, I still didn’t fully understand how the EXPLAIN function worked. It took a 1:1 with Ken and reading more articles on EXPLAIN and EXPLAIN ANALYZE — some were helpful, for example, the Postgres documentation; others not so helpful — for me to feel confident enough to attempt to fix one of the timeouts.

In Fig 1 below, we are making an API request to return a list of phone calls. The endpoint caches a list of calls. Click here for more details on how to use caching to speed up the performance of a Rails application.

The @calls.maximum(:updated_at).try(:utc) in Fig 1 used the SQL query plan, Fig 2, below to obtain the most recently modified call.

Fig 1
EXPLAIN ANALYZE SELECT MAX("calls"."updated_at") FROM "calls" LEFT OUTER JOIN "users" ON "users"."id" = "calls"."user_id" LEFT OUTER JOIN "dealership_summaries" ON "dealership_summaries"."id" = "calls"."dealership_id" LIMIT 25 OFFSET 0;                                
Limit (cost=39035.05..39035.06 rows=1 width=8) (actual time=2252.002..2252.002 rows=1 loops=1)
-> Finalize Aggregate (cost=39035.05..39035.06 rows=1 width=8) (actual time=2252.001..2252.001 rows=1 loops=1)
-> Gather (cost=39034.85..39035.05 rows=2 width=8) (actual time=2251.807..2263.931 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=38034.85..38034.85 rows=1 width=8) (actual time=2247.741..2247.741 rows=1 loops=3)
-> Parallel Seq Scan on calls (cost=0.00..37792.87 rows=483957 width=8) (actual time=0.915..2205.638 rows=387695 loops=3)
Planning Time: 1.025 ms
Execution Time: 2264.023 ms
(9 rows)
Fig 2

From the query plan in Fig 2, we can see the database spent a lot of time — and the most time — performing a sequential scan to obtain 387,695 rows: it took 0.915 ms to fetch the first row and 2205.638 ms to fetch the last row. Checking the schema of the calls table revealed that there was no index on the updated_at which was, of course, my initial suggested solution. After speaking with Ken and taking into consideration the importance of the calls table to the daily business operations and the cost of maintaining the existing 7 indexes, we decided not to go down the route of adding a new index on the updated_at column.

Further inspection of the query plan showed that the database performed the following operations in sequence: scanned all the rows in the calls table, calculated the maximum updated_at from the resulting rows and then returned the value: very weird! What we want is the most recently updated call in @calls, whose size should not exceed 25, hence it surely doesn’t require 387,695 sequential scans. The solution obviously lay in rewriting the code.

Changing the last_modified code to what can be seen in Fig 3, resulted in the database query and query plan shown in Fig 4 below.

Call.from(@calls, :calls).maximum(:updated_at).try(:utc)Fig 3EXPLAIN ANALYZE SELECT MAX("calls"."updated_at") FROM (SELECT  "calls".* FROM "calls" ORDER BY calls.created_at desc LIMIT 25 OFFSET 0) calls;QUERY PLAN--------------------------------------------------Aggregate  (cost=1.35..1.35 rows=1 width=8) (actual time=0.137..0.137 rows=1 loops=1)->  Limit  (cost=0.09..1.26 rows=25 width=2721) (actual time=0.037..0.128 rows=25 loops=1)->  Index Scan Backward using index_calls_on_created_at on calls  (cost=0.09..54712.03 rows=1161879 width=2721) (actual time=0.036..0.125 rows=25 loops=1)Planning Time: 0.172 msExecution Time: 0.169 ms(5 rows)Fig 4

This shows a massive improvement in the time it took to execute the query and the database is doing exactly what we expect. Here, the database uses the existing index on created_at, index_calls_on_created_at, to perform the subquery, which is equivalent to @calls, to return 25 rows from which it then finds the most recently updated call.

Optimising database queries is a useful skill for developers to have and it leads to great user experience. Using EXPLAIN ANALYZE reveals how the database will execute a database query and can provide clues on how to improve a slow running query. Note that EXPLAIN ANALYZE will actually execute the query, even if the query is an insert, update or delete, so be careful!

In my next blog, I will reveal how we solved another Postgres timeout by adding a partial index and deleting unused indexes. Thanks for reading, and if you have any questions or similar experiences, please share them in the comments :)




What happens under the hood at

Recommended from Medium

Things I learned while leading the eCommerce re-platforming project

Memory Adventures with InfluxDB

Creating a simple web service using JAX-WS and JAXB

5 Bits Of Advice for Software Developers Who’re Getting Started

People having a meeting

Flutterwave Payments (part 1)

Monkey Island Resources

Guybrush Threepwood and friends

Core Java Syllabus (2021)

#freestockphoto (Tuesday 24th 02PM)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chidi Egwuonwu

Chidi Egwuonwu

More from Medium

Easy way for setting up notification for AWS Glue jobs that failed using SNS

AWS DMS Heartbeat — Avoid Critical Production DB Issues

Creating an EC2 cron job for automatic submitting of an AWS DeepRacer model to the AWS DeepRacer…

Building a Date Pipeline for Instagram Comments — Part 1