PostgreSQL at Scale: Database Schema Changes Without Downtime
Braintree Payments uses PostgreSQL as its primary datastore. We rely heavily on the data safety and consistency guarantees a traditional relational database offers us, but these guarantees come with certain operational difficulties. To make things even more interesting, we allow zero scheduled functional downtime for our main payments processing services.
Several years ago we published a blog post detailing some of the things we had learned about how to safely run DDL (data definition language) operations without interrupting our production API traffic.
Since that time PostgreSQL has gone through quite a few major upgrade cycles — several of which have added improved support for concurrent DDL. We’ve also further refined our processes. Given how much has changed, we figured it was time for a blog post redux.
In this post we’ll address the following topics:
- Transactional DDL
- Table Operations
- Column Operations
- Index Operations
- Enum Types
- Bonus: Library for Ruby on Rails
First, some basics
For all code and database changes, we require that:
- Live code and schemas be forward-compatible with updated code and schemas: this allows us to roll out deploys gradually across a fleet of application servers and database clusters.
- New code and schemas be backward-compatible with live code and schemas: this allows us to roll back any change to the previous version in the event of unexpected errors.
For all DDL operations we require that:
- Any exclusive locks acquired on tables or indexes be held for at most ~2 seconds.
- Rollback strategies do not involve reverting the database schema to its previous version.
PostgreSQL supports transactional DDL. In most cases, you can execute multiple DDL statements inside an explicit database transaction and take an “all or nothing” approach to a set of changes. However, running multiple DDL statements inside a transaction has one serious downside: if you alter multiple objects, you’ll need to acquire exclusive locks on all of those objects in a single transactions. Because locks on multiple tables creates the possibility of deadlock and increases exposure to long waits, we do not combine multiple DDL statements into a single transaction. PostgreSQL will still execute each separate DDL statement transactionally; each statement will be either cleanly applied or fail and the transaction rolled back.
Note: Concurrent index creation is a special case. Postgres disallows executing
CREATE INDEX CONCURRENTLY inside an explicit transaction; instead Postgres itself manages the transactions. If for some reason the index build fails before completion, you may need to drop the index before retrying, though the index will still never be used for regular queries if it did not finish building successfully.
PostgreSQL has many different levels of locking. We’re concerned primarily with the following table-level locks since DDL generally operates at these levels:
ACCESS EXCLUSIVE: blocks all usage of the locked table.
SHARE ROW EXCLUSIVE: blocks concurrent DDL against and row modification (allowing reads) in the locked table.
SHARE UPDATE EXCLUSIVE: blocks concurrent DDL against the locked table.
Note: “Concurrent DDL” for these purposes includes
All DDL operations generally necessitate acquiring one of these locks on the object being manipulated. For example, when you run:
PostgreSQL attempts to acquire an
ACCESS EXCLUSIVE lock on the table
foos. Atempting to acquire this lock causes all subsequent queries on this table to queue until the lock is released. In practice your DDL operations can cause other queries to back up for as long as your longest running query takes to execute. Because arbitrarily long queueing of incoming queries is indistinguishable from an outage, we try to avoid any long-running queries in databases supporting our payments processing applications.
But sometimes a query takes longer than you expect. Or maybe you have a few special case queries that you already know will take a long time. PostgreSQL offers some additional runtime configuration options that allow us to guarantee query queueing backpressure doesn’t result in downtime.
Instead of relying on Postgres to lock an object when executing a DDL statement, we acquire the lock explicitly ourselves. This allows us to carefully control the time the queries may be queued. Additionally when we fail to acquire a lock within several seconds, we pause before trying again so that any queued queries can be executed without significantly increasing load. Finally, before we attempt lock acquisition, we query
pg_locks¹ for any currently long running queries to avoid unnecessarily queueing queries for several seconds when it is unlikely that lock acquisition is going to succeed.
Starting with Postgres 9.3, you adjust the
lock_timeout parameter to control how long Postgres will allow for lock acquisition before returning without acquiring the lock. If you happen to be using 9.2 or earlier (and those are unsupported; you should upgrade!), then you can simulate this behavior by using the
statement_timeout parameter around an explicit
LOCK <table> statement.
In many cases an
ACCESS EXCLUSIVE lock need only be held for a very short period of time, i.e., the amount of time it takes Postgres to update its "catalog" (think metadata) tables. Below we'll discuss the cases where a lower lock level is sufficient or alternative approaches for avoiding long-held locks that block
Note: Sometimes holding even an
ACCESS EXCLUSIVE lock for something more than a catalog update (e.g., a full table scan or even rewrite) can be functionally acceptable when the table size is relatively small. We recommend testing your specific use case against realistic data sizes and hardware to see if a particular operation will be "fast enough". On good hardware with a table easily loaded into memory, a full table scan or rewrite for thousands (possibly even 100s of thousands) of rows may be "fast enough".
In general, adding a table is one of the few operations we don’t have to think too hard about since, by definition, the object we’re “modifying” can’t possibly be in use yet. :D
While most of the attributes involved in creating a table do not involve other database objects, including a foreign key in your initial table definition will cause Postgres to acquire a
SHARE ROW EXCLUSIVE lock against the referenced table blocking any concurrent DDL or row modifications. While this lock should be short-lived, it nonetheless requires the same caution as any other operation acquiring such a lock. We prefer to split these into two separate operations: create the table and then add the foreign key.
Dropping a table requires an exclusive lock on that table. As long as the table isn’t in current use you can safely drop the table. Before allowing a
DROP TABLE ... to make its way into our production environments we require documentation showing when all references to the table were removed from the codebase. To double check that this is the case you can query PostgreSQL's table statistics view
pg_stat_user_tables² confirming that the returned statistics don't change over the course of a reasonable length of time.
While it’s unsurprising that a table rename requires acquiring an
ACCESS EXCLUSIVE lock on the table, that's far from our biggest concern. Unless the table is not being read from or written to, it's very unlikely that your application code could safely handle a table being renamed underneath it.
We avoid table renames almost entirely. But if a rename is an absolute must, then a safe approach might look something like the following:
- Create a new table with the same schema as the old one.
- Backfill the new table with a copy of the data in the old table.
- Use INSERT and UPDATE triggers on the old table to maintain parity in the new table.
- Begin using the new table.
Other approaches involving views and/or RULEs may also be viable depending on the performance characteristics required.
Note: For column constraints (e.g.,
NOT NULL) or other constraints (e.g.,
EXCLUDES), see Constraints.
Adding a column to an existing table generally requires holding a short
ACCESS EXCLUSIVE lock on the table while catalog tables are updated. But there are several potential gotchas:
Default values: Introducing a default value at the same time of adding the column will cause the table to be locked while the default value in propagated for all rows in the table. Instead, you should:
- Add the new column (without the default value).
- Set the default value on the column.
- Backfill all existing rows separately.
Note: In the recently release PostgreSQL 11, this is no longer the case for non-volatile default values. Instead adding a new column with a default value only requires updating catalog tables, and any reads of rows without a value for the new column will magically have it “filled in” on the fly.
Not-null constraints: Adding a column with a
NOT NULL constraint is only possible if there are no existing rows or a
DEFAULT is also provided. If there are no existing rows, then the change is effectively equivalent to a catalog only change. If there are existing rows and you are also specifying a default value, then the same caveats apply as above with respect to default values.
Note: Adding a column will cause all
SELECT * FROM ... style queries referencing the table to begin returning the new column. It is important to ensure that all currently running code safely handles new columns. To avoid this gotcha in our applications we require queries to avoid
* expansion in favor of explicit column references.
Change column type
In the general case changing a column’s type requires holding an exclusive lock on a table while the entire table is rewritten with the new type.
There are a few exceptions:
TEXT[9.1+] (more specifically: "when the old type is binary coercible to the new type and the using clause does not change the column contents").
- “When the new type is an unconstrained domain over the old type” [9.1+].
- When increasing or removing a length or precision limit, e.g.,
Note: Even though one of the exceptions above was added in 9.1, changing the type of an indexed column would always rewrite the index even if a table rewrite was avoided. In 9.2 any column data type that avoids a table rewrite also avoids rewriting the associated indexes. If you’d like to confirm that your change won’t rewrite the table or any indexes, you can query
pg_class³ and verify the
relfilenode column doesn't change.
If you need to change the type of a column and one of the above exceptions doesn’t apply, then the safe alternative is:
- Add a new column
- Dual write to both columns (e.g., with a
- Backfill the new column with a copy of the old column’s values.
new_<column>inside a single transaction and explicit
- Drop the old column.
It goes without saying that dropping a column is something that should be done with great care. Dropping a column requires an exclusive lock on the table to update the catalog but does not rewrite the table. As long as the column isn’t in current use you can safely drop the column. It’s also important to confirm that the column is not referenced by any dependent objects that could be unsafe to drop. In particular, any indexes using the column should be dropped separately and safely with
DROP INDEX CONCURRENTLY since otherwise they will be automatically dropped along with the column under an
ACCESS EXCLUSIVE lock. You can query
pg_depend⁴ for any dependent objects.
Before allowing a
ALTER TABLE ... DROP COLUMN ... to make its way into our production environments we require documentation showing when all references to the column were removed from the codebase. This process allows us to safely roll back to the release prior to the one that dropped the column.
Note: Dropping a column will require that you update all views, triggers, function, etc. that rely on that column.
The standard form of
CREATE INDEX ... acquires an
ACCESS EXCLUSIVE lock against the table being indexed while building the index using a single table scan. In contrast, the form
CREATE INDEX CONCURRENTLY ... acquires an
SHARE UPDATE EXCLUSIVE lock but must complete two table scans (and hence is somewhat slower). This lower lock level allows reads and writes to continue against the table while the index is built.
- Multiple concurrent index creations on a single table will not return from either
CREATE INDEX CONCURRENTLY ...statement until the slowest one completes.
CREATE INDEX CONCURRENTLY ...may not be executed inside of a transaction but does maintain transactions internally. This holding open a transaction means that no auto-vacuums (against any table in the system) will be able to cleanup dead tuples introduced after the index build begins until it finishes. If you have a table with a large volume of updates (particularly bad if to a very small table) this could result in extremely sub-optimal query execution.
CREATE INDEX CONCURRENTLY ...must wait for all transactions using the table to complete before returning.
The standard form of
DROP INDEX ... acquires an
ACCESS EXCLUSIVE lock against the table with the index while removing the index. For small indexes this may be a short operation. For large indexes, however, file system unlinking and disk flushing can take a significant amount of time. In contrast, the form
DROP INDEX CONCURRENTLY ... acquires a
SHARE UPDATE EXCLUSIVE lock to perform these operations allowing reads and writes to continue against the table while the index is dropped.
DROP INDEX CONCURRENTLY ...cannot be used to drop any index that supports a constraint (e.g.,
DROP INDEX CONCURRENTLY ...may not be executed inside of a transaction but does maintain transactions internally. This holding open a transaction means that no auto-vacuums (against any table in the system) will be able to cleanup dead tuples introduced after the index build begins until it finishes. If you have a table with a large volume of updates (particularly bad if to a very small table) this could result in extremely sub-optimal query execution.
DROP INDEX CONCURRENTLY ...must wait for all transactions using the table to complete before returning.
DROP INDEX CONCURRENTLY ... was added in Postgres 9.2. If you're still running 9.1 or prior, you can achieve somewhat similar results by marking the index as invalid and not ready for writes, flushing buffers with the pgfincore extension, and the dropping the index.
ALTER INDEX ... RENAME TO ... requires an
ACCESS EXCLUSIVE lock on the index blocking reads from and writes to the underlying table. However a recent commit expected to be a part of Postgres 12 lowers that requirement to
SHARE UPDATE EXCLUSIVE.
REINDEX INDEX ... requires an
ACCESS EXCLUSIVE lock on the index blocking reads from and writes to the underlying table. Instead we use the following procedure:
- Create a new index concurrently that duplicates the existing index definition.
- Drop the old index concurrently.
- Rename the new index to match the original index’s name.
Note: If the index you need to rebuild backs a constraint, remember to re-add the constraint as well (subject to all of the caveats we’ve documented.)
NOT NULL Constraints
Removing an existing not-null constraint from a column requires an exclusive lock on the table while a simple catalog update is performed.
In contrast, adding a not-null constraint to an existing column requires an exclusive lock on the table while a full table scan verifies that no
null values exist. Instead you should:
- Add a CHECK constraint requiring the column be not-null with
ALTER TABLE <table> ADD CONSTRAINT <name> CHECK (<column> IS NOT NULL) NOT VALID;. The
NOT VALIDtells Postgres that it doesn't need to scan the entire table to verify that all rows satisfy the condition.
- Manually verify that all rows have non-null values in your column.
- Validate the constraint with
ALTER TABLE <table> VALIDATE CONSTRAINT <name>;. With this statement PostgreSQL will block acquisition of other EXCLUSIVE locks for the table, but will not block reads or writes.
Bonus: There is currently a patch in the works (and possibly it will make it into Postgres 12) that will allow you to create a
NOT NULL constraint without a full table scan if a CHECK constraint (like we created above) already exists.
ALTER TABLE ... ADD FOREIGN KEY requires a
SHARE ROW EXCLUSIVE lock (as of 9.5) on both the altered and referenced tables. While this won't block
SELECT queries, blocking row modification operations for a long period of time is equally unacceptable for our transaction processing applications.
To avoid that long-held lock you can use the following process:
ALTER TABLE ... ADD FOREIGN KEY ... NOT VALID: Adds the foreign key and begins enforcing the constraint for all new
INSERT/UPDATEstatements but does not validate that all existing rows conform to the new constraint. This operation still requires
SHARE ROW EXCLUSIVElocks, but the locks are only briefly held.
ALTER TABLE ... VALIDATE CONSTRAINT <constraint>: This operation checks all existing rows to verify they conform to the specified constraint. Validation requires a
SHARE UPDATE EXCLUSIVEso may run concurrently with row reading and modification queries.
ALTER TABLE ... ADD CONSTRAINT ... CHECK (...) requires an
ACCESS EXCLUSIVE lock. However, as with foreign keys, Postgres supports breaking the operation into two steps:
ALTER TABLE ... ADD CONSTRAINT ... CHECK (...) NOT VALID: Adds the check constraint and begins enforcing it for all new
INSERT/UPDATEstatements but does not validate that all existing rows conform to the new constraint. This operation still requires an
ALTER TABLE ... VALIDATE CONSTRAINT <constraint>: This operation checks all existing rows to verify they conform to the specified constraint. Validation requires a
SHARE UPDATE EXCLUSIVEon the altered table so may run concurrently with row reading and modification queries. A
ROW SHARElock is held on the reference table which will block any operations requiring exclusive locks while validating the constraint.
ALTER TABLE ... ADD CONSTRAINT ... UNIQUE (...) requires an
ACCESS EXCLUSIVE lock. However, Postgres supports breaking the operation into two steps:
- Create a unique index concurrently. This step will immediately enforce uniqueness, but if you need a declared constraint (or a primary key), then continue to add the constraint separately.
- Add the constraint using the already existing index with
ALTER TABLE ... ADD CONSTRAINT ... UNIQUE USING INDEX <index>. Adding the constraint still requires an
ACCESS EXCLUSIVElock, but the lock will only be held for fast catalog operations.
Note: If you specify
PRIMARY KEY instead of
UNIQUE then any nullable columns in the index will be made
NOT NULL. This requires a full table scan which currently can't be avoided. See NOT NULL Constraints for more details.
ALTER TABLE ... ADD CONSTRAINT ... EXCLUDE USING ... requires an
ACCESS EXCLUSIVE lock. Adding an exclusion constraint builds the supporting index, and, unfortunately, there is currently no support for using an existing index (as you can do with a unique constraint).
CREATE TYPE <name> AS (...) and
DROP TYPE <name> (after verifying there are no existing usages in the database) can both be done safely without unexpected locking.
Modifying enum values
ALTER TYPE <enum> RENAME VALUE <old> TO <new> was added in Postgres 10. This statement does not require locking tables which use the enum type.
Deleting enum values
Enums are stored internally as integers and there is no support for gaps in the valid range, removing a value would currently shifting values and rewriting all rows using those values. PostgreSQL does not currently support removing values from an existing enum type.
Announcing Pg_ha_migrations for Ruby on Rails
We’re also excited to announce that we have open-sourced our internal library pg_ha_migrations. This Ruby gem enforces DDL safety in projects using Ruby on Rails and/or ActiveRecord with an emphasis on explicitly choosing trade-offs and avoiding unnecessary magic (and the corresponding surprises). You can read more in the project’s README.
 You can find active long-running queries and the tables they lock with the following query:
 You can see PostgreSQL’s internal statistics about table accesses with the following query:
 You can see if DDL causes a relation to be rewritten by seeing if the
relfilenode value changes after running the statement:
 You can find objects (e.g., indexes) that depend on a specific column by running the statement: