How to safely create unique indexes in PostgreSQL

Dovetail Engineering

Published in

Dovetail Engineering

5 min readMay 23, 2023

By Anna Zhirnova — Software Engineer

Intro

Did you know that if you add a unique constraint to an existing PostgreSQL table you could lock the database for hours, impacting 1000s of user sessions, causing request timeouts, and increasing error rates? This blog is here to help you do it in a safe way. We’ll cover why to add indexes and constraints to existing tables in the first place, the many things that can go wrong, and how to deal with them. While this blog focuses on unique indexes/constraints, most of the content applies to other indexes and constraints.

📚 This blog is part of a series about PostgreSQL migration best practices. Stay tuned for more content coming soon!

Why add a unique index in the first place?

You may want to add a unique index to an existing table to improve query performance, enforce uniqueness, and/or support upserts. Say you have a many-to-many join table reference_count(document_id, object_id, count), and you want to be able to upsert records to easily update count:

INSERT INTO 
	reference_count (document_id, object_id, count) 
VALUES 
	(...) 
ON CONFLICT (document_id, object_id) DO UPDATE;

In this case, our unique identifier is (document_id, object_id) which requires us to add a unique index on these two columns — so if we want to support upserts after creating that table, we need to add the corresponding index.

📚 An upsert is a piece of DB functionality that allows one to create a new database row or update an existing row based on a unique identifier where the identifier consists of one or more columns. In PostgreSQL, this is achieved by using ON CONFLICT DO UPDATE which requires us to have a unique index matching that identifier. Read more in this section of Postgres docs.

Attempt 1 - Creating a unique index

Looking at PostgreSQL documentation, you could just use the statement below to create a new unique index.

CREATE UNIQUE INDEX reference_count_unique_index 
	ON reference_count (document_id, object_id);

I wish it were that simple! In a production environment, this approach can lead to delayed user requests or, at worst, request timeouts and increased error rates. Larger tables can take hours to get indexed - imagine locking a table for that long 😱

That happens because PostgreSQL needs to scan the whole table to build out the index, and it takes a write lock on it for the whole duration of the scan. That means that while this transaction runs, all writes to this table will be blocked. On top of that, taking a table-wide write lock can exacerbate migration deadlocks.

So what else can we try?

Attempt 2 - Creating a unique index concurrently

The next stop on the index creation journey is creating an index concurrently, which doesn’t take a write lock on the table and avoids increasing latency and causing deadlocks. Note that it still takes a SHARE UPDATE EXCLUSIVE lock, which will prevent any changes to the schema while the index is being built.

CREATE UNIQUE INDEX CONCURRENTLY reference_count_unique_index 
	ON reference_count (document_id, object_id);

This method is significantly slower than building the index synchronously, but what it lacks in speed it makes up in safety. However, even this method isn’t bulletproof and you could see the following error next time you’re trying to use your seemingly successfully created index:

ALTER TABLE reference_count
	ADD CONSTRAINT reference_count_unique UNIQUE 
	USING INDEX reference_count_unique_index;

ERROR: [FAILED] index "reference_count_unique_index" is not valid

So what happened here? First, let’s walk through concurrent index creation step-by-step and see how it manages to create an index without grabbing a write lock:

An index is created and marked invalid
Scan the table and builds the index without acquiring a write lock
Postgres waits for all transactions that require a snapshot of the table to finish
Run another scan to make sure the index is still valid
The index is marked valid and is ready for use

Now, let’s have a look at the failure modes. The first one is deadlocks - they can happen at any point during the migration. The second one is the fact that until the index is created, it doesn’t actually enforce uniqueness and duplicates can still be created between the two scans - then, on the second scan, the uniqueness constraint will be violated and the index will not be created.

Did we go through this whole journey only to find out that there is no production-safe and guaranteed way to create indexes on existing tables?! Kind of, but it’s not the end of the world. There are only two failure modes and the good news is, both of them can be prevented!

Preventing migration deadlocks

Migration deadlocks are pesky beasts, but, lucky for you, we are publishing a dedicated article soon to show how to avoid them.

Preventing uniqueness constraint violations

You can guard against uniqueness constraint violations by eliminating the code paths that create duplicates and cleaning up your data before creating an index.

If you can’t prevent duplicates from being created, you’ve already attempted to create an index, or your initial migration failed due to deadlocks, don’t despair and read on.

Attempt 3 - Reindexing manually

If all else fails, you can force the index to re-calculate by manually running the following query:

REINDEX INDEX [CONCURRENTLY] reference_count_unique_index;

You can run the query off-peak and re-index the table synchronously, however, that still takes a write lock and carries a risk of user impact. Luckily, PostgreSQL supports re-indexing concurrently - it’s still quite slow, but sometimes it’s the only option.

📚 Read more on concurrent reindexing here.

Wait, but if we know that building an index concurrently doesn’t guarantee success, how do we know when it fails? Enter the handy query below. Now you can make sure the index creation actually worked 🙌

SELECT relname 
FROM 
	pg_class, pg_index 
WHERE 
	pg_index.indisvalid = false 
	AND pg_index.indexrelid = pg_class.oid;

-- Returns the list of invalid indexes, in this case:
-- 'reference_count_unique_index'

Recap

Alright, so what did we learn beyond the fact that creating unique indexes on existing tables can be a pain in the backside? Hopefully, you’re walking away with a solid understanding of how creating indexes works and what to do when things go wrong:

Creating a unique index synchronously takes heavy table locks and comes with a risk of degrading user experience.
Creating an index concurrently avoids user impact but success is not guaranteed. You can minimize risks by:
– Avoiding migration deadlocks
– Cleaning up your data and any code paths that may create duplicates before creating the index.
Make sure to verify that the index was created successfully!
If the index creation fails, go again - you can choose between the sync and concurrent option.

That’s all folks!