When Indexes & Performance Tuning don’t fix Slow queries in Postgres

6 min readJun 13, 2020

A tale of resolving a blocked query by fixing Postgres disk usage via Vacuum, Analyse, and two minutes break. Yes, that’s true and it might save you a few days!

This is a story of Postgres database I/O optimization in batch/bulk update settings when indexes, parameters tuning, and other commonly used solutions won’t solve the problem. It also offers brief insights into Postgres’ internal workings along the way.

Brief Architecture and Problem Scenario

A few months ago, I led a team working on enhancements of an already built ETL pipeline which was using Python/Luigi to transform and load data in Postgresql.

The database was modeled in such a way that it used “upsert” statements to ingest new updates into a single table using temporary staging tables. Data contained some long strings in a few columns and there were only a few million rows in that table taking around 6GBs. Full ETL process consisting of multiple jobs was scheduled to run every week. Postgres bulk load part of each job took around 40 minutes and data was mostly updated with few new insertions.

And then it took forever to run a query!

After a few weekly runs, we stumbled upon a problem where jobs took more time to complete at first, and eventually one day, a soft deletion query (update, after main upsert query) took more than 8 hours before we had to kill it. That query had to update only a few hundred rows using an indexed column. And the blocking occurred mostly mid of the full day run.

Previously, I had worked with MySQL (and a few NoSQL databases) in OLTP settings mostly but had never encountered such a phenomenon where a query took more than 8 hours while it was supposed to take just a few seconds at max (although I had fixed queries doing full table scans).

Let’s Debug

We started debugging the job locally but the job completed like a charm at first. Running the query in question on remote databases also completed in seconds. However, after a few local runs, we were able to reproduce the issue locally.

Hmm… Explain it to me!

We used EXPLAIN and checked the query plan but it was using an optimal plan. The query was using a “full table scan” but that was due to low cardinality of the indexed column and even that full table scan completed in minutes.

Photo by Maksym Kaharlytskyi on Unsplash

Indexes are not a silver bullet!

A team member suggested a new index but I knew that an index won’t help due to low cardinality of the column used in the query and nature of the data. However, he tried that but the full table scan was still there.

BTW this happens when the query optimizer decides that reading table data pages sequentially (full table scan) is faster than reading pages in random order (which is done when an index is used). Normally index is skipped by query optimizer if query covers more than 20–30% table data search.

Try try again…

Then we tried a few other possible options like converting “not exists” clause subquery into a join, temporary table/CTE, Postgres performance parameters tuning, and other tricks. The problem initially subsided after increasing RAM via parameters tuning but eventually, the problem came back.

After googling and going through a few query optimization links, I got the sense that the issue had something to do with Postgres MVCC (Multi-Version Concurrency Control) architecture and the way it handles update queries.

For a row update operation, Postgres deletes the old row (i. e. marks it as dead) and inserts a new row with changes (next to old row if possible which is called Heap Only Tuple or HOT tuple).

Thus, we started observing table size increases after each update and it was clear that the table was bloating according to our new knowledge about MVCC updates. Table size was bloating 1–3 GBs after updating just a boolean column.

Connecting the Dots!

I continued research on fixing Postgres “table bloat” and came across Vacuum and Analyse mechanisms employed by Postgres. At this point, we also observed that running the same queries via PgAdmin caused no blocking whereas our Python code was blocked on the second update query.

Autovacuum or manual Vacuum operations reclaim space left by dead tuples/rows. However, at times Autovacuum doesn’t work properly if there were too many updates and thus, the space is not reclaimed. This was happening in our case. Moreover, Autovacuum only works if no other query is accessing pages which contains dead rows.
Table bloat occurs when there are lots of row updates. Also a disk page has no more space, then Postgres has to write the update on new pages at the end of the table and apparently this could cause issue if table is not analysed before other queries.
“Analyse” collects table/column statistics which are used by query planner.

And, there you go!

This behavior corroborated a theory that while running queries in PgAdmin, Postgres had time to run “autovacuum” and “analyse” on the table whereas Python code was not giving it time for those chores by running queries one after another. Therefore, we decided to try out by adding a 5 minutes sleep between the two queries and voila, it finally worked!

It was a moment of ecstasy as we had reached this point after a few days of research and trials. This was because we had tried many possible solutions and each try took a few hours due to the nature of the data/problem.

In addition to adding some time break between multiple update queries, we also added a “Vacuum Full” task between a few jobs as each job left the table in a bloated state. Vacuum Full is not recommended for tables directly accessed by other apps in your system because it fully locks the table and recreates the underlying file. We implemented a table swap approach for versioning one of our users’ accessible tables to avoid downtime but that is another story for another time.

Partial Demo

I’ve created a small script to demo table bloating issue. You can find it here: https://github.com/aitzaz/postgres-upsert-table-bloat-demo. However, it doesn’t reproduce that long blocked soft deletion query scenario (I’ll try to update it later on if got time). You can check `sample_output.log` if you don’t want to run it locally.