Postgres can do THAT?

Alex Brasetvik
Cognite
Published in
6 min readDec 6, 2021
Image credit: Andrew Rice

This post supplements the presentation “Postgres can do THAT?”. (Slides). The presentation briefly covers many features of Postgres, with the hope that the audience wants to learn more about at least some of them.

This page links to resources to learn more about the topics covered.

Postgres is one of the best documented software products that exist. Its documentation is stellar. The source code is also very crisp and well-commented, with good READMEs on internals. (Example on the executor)

Hironobu Suzuki maintains an excellent resource on Postgres internals which is well worth looking at.

Crunchy Data has a blog with lots of material and an interactive learning portal.

generate_series(): Easily make lots of data

generate_series(m, n, [delta]) is a set returning function that emits arbitrarily large amounts of numeric/timestamp data. Useful to easily generate large amounts of test data.

Having sufficiently large amounts of test data is important for making sure schemas, indexes and queries are performant. Add a few million rows in seconds, and slow queries and missing indexes will be harder to miss. Easily created synthetic data is useful, but not a replacement from testing with real data, of course.

EXPLAIN — show query plans and profile queries

EXPLAIN shows the plan of a query. In addition to reads, it can also make sense to use for write queries, such as to debug why a DELETE is taking forever. (Probably a cascading delete that’s missing an index)

EXPLAIN while developing and testing out new queries. Make EXPLAIN a natural part of how you write new queries, and your understanding and intuition of how Postgres executes your query and uses indexes will improve a lot. Explain is not just a tool to bring out when you need to debug why something is slow.

RETURNING: Get back what you touched

For returning data from rows that were modified by the update/insert/delete statement.

Often useful in combination with common table expressions (see next section)

# insert into person (name) values ('Groot') returning id;
id
----
4
(1 row)

WITH, or Common Table Expressions

WITH queries allow you to decompose your query in more readable parts (which in Postgres ≥12 will likely execute as efficiently as less-readable equivalent queries), do advanced transformations across tables in a single statement or even serialize large object graphs in a single database round trip.

WITH statements also enable recursive queries, such as exploring a graph.

Note: Common table expressions improved significantly in Postgres 12. When reading about “optimization barriers” and materialization and the dangers of CTEs, consider whether it’s assuming Postgres ≤11 or other databases.

What’s up in my database?

Locks — and how to minimize them

Creating indexes and modifying tables requires locks.

Attempting to take an exclusive lock (and not getting it) will still block concurrent users. Waiting ten seconds for an exclusive lock you hold for 10 milliseconds block others for 10.01 seconds.

Deferred constraints

A deferred constraint gets evaluated when attempting to COMMIT, not right away as the row is inserted. Important for e.g. cyclic foreign key constraints, e.g. when you require something in two tables or none of them.

Indexes on expressions and partial indexes

Indexes can be made on expressions which may call functions (e.g lower(username) in a unique index to ensure unique lowercase usernames) and partial (e.g. unique index on assets(project_id, external_id) where deleted_at is not null) or both.

Range types

A range is a numeric-like range, which can be open-ended. Operations on ranges are e.g. overlaps/intersects, contains, etc. These operations can be indexed via “GiST”-indexes.

Exclusion constraints

Typically used with ranges, exclusion constraints can ensure that e.g. the same room doesn’t get booked twice in overlapping time ranges.

JSON and lateral joins

Postgres has excellent JSON support. You can store JSON in a column (and be sure it’s valid JSON and do certain operations on it), as well as do aggregates on rows and compose JSON object graphs.

A lateral join is a bit like “for each”, which lets you execute a sub-query per row (which can refer to that row).

Lateral joins combined with JSON aggregates are the foundation for GraphQL-engines building on Postgres.

Window functions

Window functions can do aggregates and other functions (such as ranking) across “windows” without strictly grouping rows. A window can be all rows sharing a value, n rows before or after the “current” row, etc. (Window functions are computed in a final pass over the result set, whereas groupings change the result set)

LISTEN / NOTIFY

If you want to “listen” to changes to pick up changes as quickly as possible (without continuous polling at short intervals), you can “notify” from a client, and clients that have registered a listener will be notified.

Use cases could be e.g. waking up a syncing process to quickly sync things into Elasticsearch.

(Notify payloads need to be small and are only delivered to live connections. Not a queuing solution in their own!)

Advisory locks

Advisory locks are explicit locks that last across transactions for as long as the connection that got the lock is alive.

If you need a Postgres connection to do your work, advisory locks can be a reasonable “leader election” mechanism as well. (For example, you may want to run multiple processes to sync data into Elasticsearch for high availability, but only need one of them to run at any moment)

Hiring :)

Postgres, Elasticsearch, Kafka and FoundationDB are some of the important backend technologies we use at Cognite. We typically have roles open for strong candidates with lots of database experience.

--

--

Alex Brasetvik
Cognite
Writer for

Working on various cloud and security related things at Cognite. Previously at Elastic through the acquisition of my startup.