Adding a NOT NULL CONSTRAINT on PG Faster with Minimal Locking
Doctolib is a continuous web service available 24/7 for practitioners and patients. As we have complete responsibility in the handling of all practitioners’ agendas, it is crucial that there be zero downtime. Practitioners must be able to access their agendas and incoming events at anytime. Yet, as a service, we migrate our database schema from time to time to deliver new features; these migrations can be risky and must be handled with care to avoid any disruption.
The process is particularly delicate when executed on large-scale, frequently modified database tables. At Doctolib we migrate data every few days; unfortunately, on tables with a significant amount of data (>30M rows) standard migration best practices may not be enough to prevent service downtime. We currently use a tool to prevent dangerous operations but sometimes certain migrations require additional safety measures.
Using constraints is crucial to avoid data corruption, however, with the addition of a constraint comes the danger of locking your table should the operation be performed incorrectly. As you can see below, we decided to look at the issue from a different perspective. Thanks to PostgreSQL, we are now capable of migrating a large table by adding a not-null constraint on a new column and all of this without locking read and write during the operation.
Touchy migration with PostgreSQL
Let’s look more closely at the particular case that inspired the need for a safer migration: adding a
NOT NULL constraint onto a column on a table with 30 million rows. Not really big data per se, but it could have led to some service downtime if the migration had been performed carelessly.
When you try to add a
NOT NULL constraint onto a column, it will be executed on PostgreSQL as an atomic operation like:
ALTER TABLE table_name ALTER COLUMN column_name SET NOT NULL;
As a consequence, PostgreSQL will:
- fully scan the table to check that the constraint is valid on all the rows
- get an
ACCESS EXCLUSIVE LOCKwhich locks the whole table for writing (cf. https://www.postgresql.org/docs/current/static/explicit-locking.html)
This can be dangerous if:
- the table is intensively modified
- the full scan takes some time, especially relevant when dealing with a large table
It may put your application workers in a waiting state and freeze your application, which will bubble up quickly, ending with a full service outage during the operation.
We tried in our staging environment, for 30 million rows, the operation took 1.7 seconds.
Let’s say that you write in the table 100 times per second, given that, it would have locked more than 100 connections on your database during the operation.
The Standard Solution: Just Do It
A typical response to this particular situation would be to add the
NOT NULL constraint on the column and to do so carefully, planning the migration at the most appropriate time when service usage is low and switching to maintenance mode during this costly migration.
Why we did not want to take this direction:
As the table we were looking to migrate is among one of the most important in our application, we would have had no choice but to migrate it during the night when practitioners and patient traffic is low. This would unfortunately still prove problematic for people working night-shifts. There had to be a smarter way that could also avoid any risk of hassle for Doctolib users.
PostgreSQL CHECK CONSTRAINT to the Rescue
Looking at the PostgreSQL documentation:
A not-null constraint is always written as a column constraint. A not-null constraint is functionally equivalent to creating a check constraint
CHECK (column_name IS NOT NULL).
To put it simply, the CHECK CONSTRAINT is somewhat like a column constraint, but it belongs to the table. For instance a CHECK CONSTRAINT can check that a price column must have a value greater than 100 :
CREATE TABLE products (
CONSTRAINT check_price_value CHECK (price > 100)
In the case we are looking at, a CHECK CONSTRAINT can enforce non NULL values.
The trick here is that you can issue a NOT VALID option when adding a check constraint. This will tell PostgreSQL that you are aware that the constraint may not be valid on existing data and that it does not have to check it. Subsequent inserts or updates, however, will be enforced.
Essentially this removes the potentially lengthy initial check on your table. This operation still gets an ‘EXCLUSIVE LOCK’, impeding writes on the table, but since it will not validate all the rows, it will be very quick: 6 ms (still on our demo env).
How can we use this feature?
- Add the CHECK CONSTRAINT without forgetting to define it as NOT VALID:
ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK (column IS NOT NULL) NOT VALID;
2. You can just tell PostgreSQL to validate your constraint with another statement:
ALTER TABLE table_name VALIDATE CONSTRAINT constraint_name;
The VALIDATE command will:
- fully scan the table
- get a ‘SHARE UPDATE EXCLUSIVE’ lock, which only locks schema changes like other ALTER TABLE commands. We can still read and write in the table.
- PostgreSQL assumes that new data is already enforced, and checks existing data on the table to render the constraint as valid which is why no writing lock is needed at all.
Differences between a column NOT NULL constraint and a CHECK CONSTRAINT not null
The end result may be the same, but there are still a few differences:
- check constraints have to be named and belong to the table, while a NOT NULL column is just an option of the latter
- from a performance point of view, PostgreSQL documentation says that
in PostgreSQL creating an explicit not-null constraint is more efficient.
A benchmark on Stackexchange says that there’s a 0.5%- performance penalty difference for writes. We saw around 1% in our own tests, negligible in our case but something that could be a consideration depending on the parameters of your situation.
- you have to remove check constraints before removing the associated columns
- a NOT NULL is written beside the column name when issuing “\d your_table” on psql, while check constraints are described below a specific session
In any case, do not forget to backfill all the data in your table with a default value ;-)
A New Standard?
This kind of migration can be very tricky when modifying core tables used throughout the entire service and with lots of data that changes frequently. Fortunately, as time passes, it is less necessary to modify core tables.
Since this particular migration we have not been in another situation where we have had to add NOT NULL CONSTRAINTS on core tables. Despite the success of this one migration, we have yet to set a standard at Doctolib. First of all the solution does not align with our philosophy of the Principle of Least Surprise. Secondly, the performance degradation, while acceptable in our case, may not be in the future. When the time comes and we are faced with this situation again we will have to consider which solution will best suit our needs.