The ultimate EXPLAIN reading guide for PostgreSQL — Part 1.1

Published in

Lett

4 min readJun 6, 2021

Sequential Scans

PostgreSQL is a great database engine. No question about that! But no data model is perfect; and being able to debug your query to figure out why it is taking too long to return is a great skill for all kinds of devs: from Data Scientists who just want to fetch data efficiently (which is my case), to DBAs just trying to find the missing index.
In this series, I’m going [try and] list and illustrate every class in PostgreSQL execution plan nodes and, well… explain them! So you know what you are looking at when you execute a query with EXPLAIN before it.
I’ll try to illustrate as best as I can, but feel free to open an issue in my github project in case you find a typo or two.

🔍 Sequential Scan — Reading in the disk order

hiding in postgres/src/backend/executor/nodeSeqscan.c

A query like:

may return a result set like:

which can get you wondering: what is causing the (sometimes seemingly random) sorting of the results from a SELECT with no ORDER BY?

Data from the tables are stored in blocks (or pages) — usually of 8k of data by default — and rows (usually referred to as “tuples”) that can be identified by their CTID:

The CTID is a tuple in the format “(<block number>, <row offset>)” that identifies data inside a page file. Adding the hidden system CTID column to the query reveals the source of the ordering:

CTIDs may not be contiguous because whenever a row is deleted the tuple slot becomes invalid ( or “dead” ) until a vacuum is run and the space is marked as reusable. The same happens when a row is updated, as the old row is marked as deleted and a new copy of the modified data is written in whatever slot is marked as available.

What is a Sequential Scan?

The simplest way to read a table from a disk is simply to fetch the rows in the order they are stored. It requires minimum overhead, but is associated with poor performance in a lot of cases where filtering, ordering or grouping is involved.

Example output using “EXPLAIN (VERBOSE, ANALYSE, BUFFERS)”:

-> Seq Scan on pg_catalog.pg_database d (cost=0.00..1.02 rows=2 width=68) (actual time=0.011..0.014 rows=3 loops=1)
  Output: d.datname, d.oid
  Buffers: shared hit=1

📋 Fields:

“Output”: The columns returned from the step.
Requires VERBOSE
“Buffers”: Information about cached data and reading operations.
Requires BUFFERS

✅ It’s OK to see this step if

Your query is trying to read the entire table. If no sorting or filtering is needed, just returning the rows in the order they’re stored in the disk requires basically no overhead.
The available indexes won’t discard enough rows. Similar to the previous point, fetching almost every block from the disk by looking at an index that isn’t able to filter many rows may cause unnecessary overhead.
The table is really small. Loading a few blocks directly to memory may be faster than reading from a index.

❎ It’s a problem to see this step when

Your query should return a few rows from a large table using a filter. This may indicate that an important index is not working or simply doesn’t exist.
The execution plan’s row count estimate is really off. In some cases, Postgres may be completely unaware of how big a table actually is. This is a sign that the table hasn’t been vacuumed and analysed properly. ‘autovacuum’ and ‘autoanalyse’ settings should also be reviewed system wide or for the specific table.

ℹ️ Extra info

VACUUM FULL will cause the table to have contiguous CTIDs because it rewrites the data to the disk. But the original order is kept.
CLUSTER will reorder the table on the disk by rewriting it sorted by a given index.

Example

SELECT
  country_name
FROM
  country
WHERE
  country_name ILIKE 'b%'
ORDER BY
  country_id;

results in the following execution plan:

Sort  (cost=2.15..2.21 rows=26 width=16)
  Sort Key: country_id
  ->  Seq Scan on country  (cost=0.00..1.54 rows=26 width=16)
        Filter: ((country_name)::text ~~* 'b%'::text)

we can think of this plan as a DAG: