Postgres can do THAT?

Published in

Cognite

6 min readDec 6, 2021

This post supplements the presentation “Postgres can do THAT?”. (Slides). The presentation briefly covers many features of Postgres, with the hope that the audience wants to learn more about at least some of them.

This page links to resources to learn more about the topics covered.

Postgres is one of the best documented software products that exist. Its documentation is stellar. The source code is also very crisp and well-commented, with good READMEs on internals. (Example on the executor)

Hironobu Suzuki maintains an excellent resource on Postgres internals which is well worth looking at.

Crunchy Data has a blog with lots of material and an interactive learning portal.

generate_series(): Easily make lots of data

generate_series(m, n, [delta]) is a set returning function that emits arbitrarily large amounts of numeric/timestamp data. Useful to easily generate large amounts of test data.

Having sufficiently large amounts of test data is important for making sure schemas, indexes and queries are performant. Add a few million rows in seconds, and slow queries and missing indexes will be harder to miss. Easily created synthetic data is useful, but not a replacement from testing with real data, of course.

EXPLAIN — show query plans and profile queries

EXPLAIN shows the plan of a query. In addition to reads, it can also make sense to use for write queries, such as to debug why a DELETE is taking forever. (Probably a cascading delete that’s missing an index)

EXPLAIN while developing and testing out new queries. Make EXPLAIN a natural part of how you write new queries, and your understanding and intuition of how Postgres executes your query and uses indexes will improve a lot. Explain is not just a tool to bring out when you need to debug why something is slow.

Postgres docs: Using explain
Visualize a plan: Dalibo PEV2
Highlight slow parts of a plan: explain.depesz.com
Auto-explain: for logging execution plans of slow statements automatically, without having to run EXPLAIN by hand.
Crunchy Data’s interactive learning portal’s coverage of EXPLAIN.

RETURNING: Get back what you touched

For returning data from rows that were modified by the update/insert/delete statement.

Often useful in combination with common table expressions (see next section)

# insert into person (name) values ('Groot') returning id;
 id
----
  4
(1 row)

Postgres docs on RETURNING
Data-modifying statements in WITH (where RETURNING is important)

WITH, or Common Table Expressions

WITH queries allow you to decompose your query in more readable parts (which in Postgres ≥12 will likely execute as efficiently as less-readable equivalent queries), do advanced transformations across tables in a single statement or even serialize large object graphs in a single database round trip.

WITH statements also enable recursive queries, such as exploring a graph.

Postgres docs on WITH
Using Writeable CTEs to Improve Performance (via inserting to multiple tables in a single round-trip)
Postgres graph queries with billions of nodes (Alibaba cloud)
Gentle intro on recursive SQL queries
WITH-queries: Present and future

Note: Common table expressions improved significantly in Postgres 12. When reading about “optimization barriers” and materialization and the dangers of CTEs, consider whether it’s assuming Postgres ≤11 or other databases.

What’s up in my database?

pg_stat_activity: view with one row per server process, showing what that process is up to (or waiting for)
Deep dive into postgres stats: pg_stat_activity and pg_locks
PostgresDBA: useful canned queries for your .psqlrc
application_name: Name your client, to make it easier to spot in logs and in pg_stat_activity
Getting the Most of out Application_Name
pg_stat_statements: tracking planning and execution statistics of all SQL statements executed by a server
The most useful Postgres extension: pg_stat_statements

Locks — and how to minimize them

Creating indexes and modifying tables requires locks.

Attempting to take an exclusive lock (and not getting it) will still block concurrent users. Waiting ten seconds for an exclusive lock you hold for 10 milliseconds block others for 10.01 seconds.

Must read: PostgreSQL at Scale: Database Schema Changes Without Downtime
lock_timeout setting
CREATE [UNIQUE] INDEX CONCURRENTLY: Adding indexes without blocking writes
Explaining create index concurrently (with bonus Heap Only Tuples-coverage)
ALTER TABLE … ADD CONSTRAINT … NOT VALID: Add constraints with minimal locking
Blog post making it more approachable (NOT NULL has become less of a problem since)
Why You Should Care That Your SQL DDL is Transactional
PostgreSQL locking, part 1: Row locks
PostgreSQL locking, part 2: heavyweight locks

Deferred constraints

A deferred constraint gets evaluated when attempting to COMMIT, not right away as the row is inserted. Important for e.g. cyclic foreign key constraints, e.g. when you require something in two tables or none of them.

Indexes on expressions and partial indexes

Indexes can be made on expressions which may call functions (e.g lower(username) in a unique index to ensure unique lowercase usernames) and partial (e.g. unique index on assets(project_id, external_id) where deleted_at is not null) or both.

Range types

A range is a numeric-like range, which can be open-ended. Operations on ranges are e.g. overlaps/intersects, contains, etc. These operations can be indexed via “GiST”-indexes.

Exclusion constraints

Typically used with ranges, exclusion constraints can ensure that e.g. the same room doesn’t get booked twice in overlapping time ranges.

JSON and lateral joins

Postgres has excellent JSON support. You can store JSON in a column (and be sure it’s valid JSON and do certain operations on it), as well as do aggregates on rows and compose JSON object graphs.

A lateral join is a bit like “for each”, which lets you execute a sub-query per row (which can refer to that row).

Lateral joins combined with JSON aggregates are the foundation for GraphQL-engines building on Postgres.

Window functions

Window functions can do aggregates and other functions (such as ranking) across “windows” without strictly grouping rows. A window can be all rows sharing a value, n rows before or after the “current” row, etc. (Window functions are computed in a final pass over the result set, whereas groupings change the result set)

LISTEN / NOTIFY

If you want to “listen” to changes to pick up changes as quickly as possible (without continuous polling at short intervals), you can “notify” from a client, and clients that have registered a listener will be notified.

Use cases could be e.g. waking up a syncing process to quickly sync things into Elasticsearch.

(Notify payloads need to be small and are only delivered to live connections. Not a queuing solution in their own!)

Advisory locks

Advisory locks are explicit locks that last across transactions for as long as the connection that got the lock is alive.

If you need a Postgres connection to do your work, advisory locks can be a reasonable “leader election” mechanism as well. (For example, you may want to run multiple processes to sync data into Elasticsearch for high availability, but only need one of them to run at any moment)

Hiring :)

Postgres, Elasticsearch, Kafka and FoundationDB are some of the important backend technologies we use at Cognite. We typically have roles open for strong candidates with lots of database experience.