Postgres can do THAT?
This post supplements the presentation “Postgres can do THAT?”. (Slides). The presentation briefly covers many features of Postgres, with the hope that the audience wants to learn more about at least some of them.
This page links to resources to learn more about the topics covered.
Postgres is one of the best documented software products that exist. Its documentation is stellar. The source code is also very crisp and well-commented, with good READMEs on internals. (Example on the executor)
Hironobu Suzuki maintains an excellent resource on Postgres internals which is well worth looking at.
Crunchy Data has a blog with lots of material and an interactive learning portal.
generate_series(): Easily make lots of data
generate_series(m, n, [delta])
is a set returning function that emits arbitrarily large amounts of numeric/timestamp data. Useful to easily generate large amounts of test data.
Having sufficiently large amounts of test data is important for making sure schemas, indexes and queries are performant. Add a few million rows in seconds, and slow queries and missing indexes will be harder to miss. Easily created synthetic data is useful, but not a replacement from testing with real data, of course.
- Postgres doc on set returning functions
- Generating time series between two dates in Postgres
- How to produce meaningful datasets using only SQL
EXPLAIN — show query plans and profile queries
EXPLAIN shows the plan of a query. In addition to reads, it can also make sense to use for write queries, such as to debug why a DELETE is taking forever. (Probably a cascading delete that’s missing an index)
EXPLAIN while developing and testing out new queries. Make EXPLAIN a natural part of how you write new queries, and your understanding and intuition of how Postgres executes your query and uses indexes will improve a lot. Explain is not just a tool to bring out when you need to debug why something is slow.
- Postgres docs: Using explain
- Visualize a plan: Dalibo PEV2
- Highlight slow parts of a plan: explain.depesz.com
- Auto-explain: for logging execution plans of slow statements automatically, without having to run EXPLAIN by hand.
- Crunchy Data’s interactive learning portal’s coverage of EXPLAIN.
RETURNING: Get back what you touched
For returning data from rows that were modified by the update/insert/delete statement.
Often useful in combination with common table expressions (see next section)
# insert into person (name) values ('Groot') returning id;
id
----
4
(1 row)
- Postgres docs on RETURNING
- Data-modifying statements in WITH (where RETURNING is important)
WITH, or Common Table Expressions
WITH queries allow you to decompose your query in more readable parts (which in Postgres ≥12 will likely execute as efficiently as less-readable equivalent queries), do advanced transformations across tables in a single statement or even serialize large object graphs in a single database round trip.
WITH statements also enable recursive queries, such as exploring a graph.
- Postgres docs on WITH
- Using Writeable CTEs to Improve Performance (via inserting to multiple tables in a single round-trip)
- Postgres graph queries with billions of nodes (Alibaba cloud)
- Gentle intro on recursive SQL queries
- WITH-queries: Present and future
Note: Common table expressions improved significantly in Postgres 12. When reading about “optimization barriers” and materialization and the dangers of CTEs, consider whether it’s assuming Postgres ≤11 or other databases.
What’s up in my database?
- pg_stat_activity: view with one row per server process, showing what that process is up to (or waiting for)
- Deep dive into postgres stats: pg_stat_activity and pg_locks
- PostgresDBA: useful canned queries for your .psqlrc
- application_name: Name your client, to make it easier to spot in logs and in pg_stat_activity
- Getting the Most of out Application_Name
- pg_stat_statements: tracking planning and execution statistics of all SQL statements executed by a server
- The most useful Postgres extension: pg_stat_statements
Locks — and how to minimize them
Creating indexes and modifying tables requires locks.
Attempting to take an exclusive lock (and not getting it) will still block concurrent users. Waiting ten seconds for an exclusive lock you hold for 10 milliseconds block others for 10.01 seconds.
- Must read: PostgreSQL at Scale: Database Schema Changes Without Downtime
- lock_timeout setting
- CREATE [UNIQUE] INDEX CONCURRENTLY: Adding indexes without blocking writes
- Explaining create index concurrently (with bonus Heap Only Tuples-coverage)
- ALTER TABLE … ADD CONSTRAINT … NOT VALID: Add constraints with minimal locking
- Blog post making it more approachable (NOT NULL has become less of a problem since)
- Why You Should Care That Your SQL DDL is Transactional
- PostgreSQL locking, part 1: Row locks
- PostgreSQL locking, part 2: heavyweight locks
Deferred constraints
A deferred constraint gets evaluated when attempting to COMMIT, not right away as the row is inserted. Important for e.g. cyclic foreign key constraints, e.g. when you require something in two tables or none of them.
- Constraints training (Crunchy Data Interactive Learning Portal for Developers)
- Postgres docs on SET CONSTRAINTS
- Deferrable SQL Constraints in Depth
Indexes on expressions and partial indexes
Indexes can be made on expressions which may call functions (e.g lower(username)
in a unique index to ensure unique lowercase usernames) and partial (e.g. unique index on assets(project_id, external_id) where deleted_at is not null
) or both.
- Efficient Use of PostgreSQL Indexes
- Postgres docs on indexing expressions
- Postgres docs on partial indexes
Range types
A range is a numeric-like range, which can be open-ended. Operations on ranges are e.g. overlaps/intersects, contains, etc. These operations can be indexed via “GiST”-indexes.
- Postgres docs on range types
- Data types training (Crunchy Data Interactive Learning Portal for Developers)
- Range Types & Recursion: How to Search Availability with PostgreSQL
- Range Types in PostgreSQL and GiST Indexes
- Better Range Types in Postgres 14: Turning 100 Lines of SQL Into 3
Exclusion constraints
Typically used with ranges, exclusion constraints can ensure that e.g. the same room doesn’t get booked twice in overlapping time ranges.
JSON and lateral joins
Postgres has excellent JSON support. You can store JSON in a column (and be sure it’s valid JSON and do certain operations on it), as well as do aggregates on rows and compose JSON object graphs.
A lateral join is a bit like “for each”, which lets you execute a sub-query per row (which can refer to that row).
Lateral joins combined with JSON aggregates are the foundation for GraphQL-engines building on Postgres.
- JSON-training (Crunchy Data Interactive Learning Portal for Developers) : JSON
- Lateral join training (Crunchy Data Interactive Learning Portal for Developers)
- Building a GraphQL to SQL Compiler on Postgres, MS SQL and MySQL
Window functions
Window functions can do aggregates and other functions (such as ranking) across “windows” without strictly grouping rows. A window can be all rows sharing a value, n rows before or after the “current” row, etc. (Window functions are computed in a final pass over the result set, whereas groupings change the result set)
LISTEN / NOTIFY
If you want to “listen” to changes to pick up changes as quickly as possible (without continuous polling at short intervals), you can “notify” from a client, and clients that have registered a listener will be notified.
Use cases could be e.g. waking up a syncing process to quickly sync things into Elasticsearch.
(Notify payloads need to be small and are only delivered to live connections. Not a queuing solution in their own!)
Advisory locks
Advisory locks are explicit locks that last across transactions for as long as the connection that got the lock is alive.
If you need a Postgres connection to do your work, advisory locks can be a reasonable “leader election” mechanism as well. (For example, you may want to run multiple processes to sync data into Elasticsearch for high availability, but only need one of them to run at any moment)
- Postgres docs on “explicit locking”, or advisory locks
- Advisory locks and how to use them
- Distributed Locking with Postgres Advisory Locks
Hiring :)
Postgres, Elasticsearch, Kafka and FoundationDB are some of the important backend technologies we use at Cognite. We typically have roles open for strong candidates with lots of database experience.