PostgreSQL schema-change gotchas

Viktoriia Romanchenko
Preply Engineering Blog
5 min readFeb 15, 2021

This article describes a few cases where a migration might look good, but it can cause downtime anyway. I’ve included a “what to do” section for each of these cases.

A gorgeous African Elephant. Photo by me.

My name is Viktoriia Romanchenko and I’m a Software Engineer at Preply.com. I wrote this article because I made one of the mistakes described below, and of course production was down. Let’s get started!

Oopsies covered

  • Add column with default value
  • Add column to heavily used table
  • Change column type
  • Create index
  • Add check constraint
  • Add not-null constraint

Add column with default value

ALTER TABLE table ADD COLUMN "new_column" BOOLEAN DEFAULT FALSE;

ALTER TABLE .. ADD COLUMN .. DEFAULT can cause downtime for PostgreSQL versions lower than 11.

If your PostgreSQL version is 11 or higher, you can add columns with a default safely, but not with a dynamic one.

From PostgreSQL docs:

From PostgreSQL 11, adding a column with a constant default value no longer means that each row of the table needs to be updated when the ALTER TABLE statement is executed. Instead, the default value will be returned the next time the row is accessed, and applied when the table is rewritten, making the ALTER TABLE very fast even on large tables.

If your version is lower than 11 or the default value is volatile (e.g., clock_timestamp()), PostgreSQL is actually rewriting the whole table, adding the column to each row, and filling it with default value. It might be okay for small tables, but bear in mind that the whole table will be locked until the operation completes. And it doesn’t matter if this new column is nullable.

What to do instead

  1. Add the new column (without the default value).
  2. Set the default value on the column. This operation locks the table too, but it’s bad only if the statement takes a long time. SET DEFAULT operation on an existing column is fast, because it doesn’t scan the table.
  3. Backfill all existing rows separately.
ALTER TABLE table ADD COLUMN "new_column" BOOLEAN;ALTER TABLE table ALTER COLUMN new_column SET DEFAULT FALSE;--- backfill the existing rows ---

Add column to heavily used table

ALTER TABLE table ADD COLUMN "new_column" INT;

Believe it or not, a simple ALTER TABLE... ADD COLUMN can cause downtime. (Even without a DEFAULT or NOT NULL constraint!)

This happens because:

  1. ALTER TABLE sets the ACCESS EXCLUSIVE LOCK before waiting for any other currently executing reads or writes to finish.
  2. ACCESS EXCLUSIVE LOCK locks everything until the lock is gone. So all other reads and writes are blocked.
  3. PostgreSQL waits for all previous queries on that table to finish execution; all new queries on that table are blocked, which causes downtime.

What to do instead

It seems that for really busy systems, the only safe way to add a column is to retry in a loop with an explicitly short lock_timeout value. Take a look at the explanation and the solution by depesz.

Change column type

ALTER TABLE table ALTER COLUMN name TYPE INT;

Changing a column’s type can be a reason of downtime because in the general case it requires holding an ACCESS EXCLUSIVE LOCK on a table while the entire table is rewritten with the new type. That blocks all usage of the table.

What to do #1

The first solution is suggested by an article on the same topic by Braintree:

  1. Add a new column new_<column>.
  2. Dual write to both columns (e.g., with a BEFORE INSERT/UPDATE trigger).
  3. Backfill the new column with a copy of the old column’s values.
  4. Rename <column> to old_<column> and new_<column> inside a single transaction and explicit LOCK <table> statement.
  5. Drop the old column.

What to do #2

The second solution is less sweaty, that’s why I prefer it more. Also, there is a good chance that the flow with renaming columns will cause downtime during rollback. So… how about not changing column types at all?

  1. Create a new column with desired type.
  2. Deprecate the old column.
  3. Start using the new column, but keep the old one for some time to ensure you are backward compatible.

Create index

CREATE INDEX idx_name ON table_name (column_name);

This statment can cause downtime because CREATE INDEX blocks writes (but not reads) on the table until it’s done.

When CREATE INDEX CONCURRENTLY option is used, it takes significantly longer to complete. But since it allows normal operations (both reads and writes) to continue while the index is built, this method is used for adding new indexes in a production environment.

What to do

CREATE INDEX CONCURRENTLY idx_name ON table_name (column_name);

From PostgreSQL docs:

A regular CREATE INDEX command can be performed within a transaction block, but CREATE INDEX CONCURRENTLY cannot.

Add check constraint

ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK (something);

Adding check constraints can cause downtime because it requires an ACCESS EXCLUSIVE LOCK that doesn’t allow read and write operations on a table. However, it can be broken down into two alter table statements.

What to do

Divide et impera.

  1. Add CHECK CONSTRAINT with NOT VALID option. The NOT VALID tells PostgreSQL that it doesn’t need to scan the entire table to verify that all rows satisfy the condition. Future inserts or updates, however, will be enforced. This operation still grabs an ACCESS EXCLUSIVE LOCK but since it doesn’t scan the table, it will be quick.
  2. Validate the constraint. Validation requires a SHARE UPDATE EXCLUSIVE LOCK on the altered table so can run concurrently with read and write queries.
ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK (something) NOT VALID;ALTER TABLE table_name VALIDATE CONSTRAINT constraint_name;

Add not-null constraint

ALTER TABLE table ALTER COLUMN column SET NOT NULL;

Adding a not-null constraint to an existing column can cause downtime because it requires an ACCESS EXCLUSIVE LOCK on the table when PostgreSQL fully scans the table to check that the constraint is valid on all the rows.

What to do instead

From PostgreSQL docs:

A not-null constraint is always written as a column constraint. A not-null constraint is functionally equivalent to creating a check constraint CHECK (column_name IS NOT NULL).

So NOT NULL constraint can be replaced with CHECK CONSTRAINT. And we do everything like in the case above:

  1. Add CHECK CONSTRAINT with NOT VALID option.
  2. Manually verify that all rows have non-null values in the column.
  3. Validate the constraint.
ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK (column IS NOT NULL) NOT VALID;---- verify that all rows have non-null values ---ALTER TABLE table_name VALIDATE CONSTRAINT constraint_name;

You can find more about this case in an article from Doctolib.

--

--