Uniqueness in PostgreSQL: Constraints versus Indexes
While doing a bit of database cleaning, I noticed many tables with more than a few indexes and constraints. In particular, a few tables had both a unique index and a unique constraint for the same column. Constraints and indexes come at a cost: they add write overhead with each
UPDATE and they can take up significant amounts of space. Since a unique constraint and a unique index both enforce uniqueness, which one should we prefer? And are they different?
lauren=# \d examples
Column | Type | Modifiers
a_constraint | integer |
b_unique_idx | integer |
c_non_unique | integer |
"examples_a_unique_constraint" UNIQUE CONSTRAINT, btree (a_constraint)
"examples_b_unique_index" UNIQUE, btree (b_unique_idx)
A unique constraint enforces that the specified columns are unique across all rows in the table. Importantly, a unique constraint allows nulls, and null values compared against each other are not considered equal. A unique index and a unique constraint return the same error when violated “duplicate key value violates unique constraint.”
lauren=# INSERT INTO examples VALUES (0,0,0);
INSERT 0 1
lauren=# INSERT INTO examples VALUES (null,1,0);
INSERT 0 1
lauren=# INSERT INTO examples VALUES (null,2,0);
INSERT 0 1
lauren=# INSERT INTO examples VALUES (0,3,0);
ERROR: duplicate key value violates unique constraint "examples_a_unique_constraint"
DETAIL: Key (a_constraint)=(0) already exists.
lauren=# INSERT INTO examples VALUES (1,4,0);
INSERT 0 1
lauren=# INSERT INTO examples VALUES (2,null,0);
INSERT 0 1
lauren=# INSERT INTO examples VALUES (3,null,0);
INSERT 0 1
lauren=# INSERT INTO examples VALUES (4,4,0);
ERROR: duplicate key value violates unique constraint "examples_b_unique_index"
DETAIL: Key (b_unique_idx)=(4) already exists.
Note that we were able to add multiple null values for column
a_constraint and column
b_unique_idx, and we have validated that uniqueness is sufficiently enforced in both cases.
“Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns listed in the constraint.”
“Note: The preferred way to add a unique constraint to a table is ALTER TABLE … ADD CONSTRAINT. The use of indexes to enforce unique constraints could be considered an implementation detail that should not be accessed directly. One should, however, be aware that there’s no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.”
The documentation for unique constraint specifies that a unique index is created under the hood upon creation of a unique constraint. This means we’re already getting a unique index with each unique constraint, and adding another unique index is simply duplicating the one underneath our unique constraint. If you think about it, this makes perfect sense from a performance perspective; a constraint needs to quickly look up all values in a column upon insert or update in order to verify duplicates have not been introduced. The fastest way to do this lookup is to have these values in an index, ergo an index is automatically created upon creation of a constraint.
The documentation for unique index specifies that we should be using a constraint to enforce uniqueness. The fact that a unique index will also be constrained in the same way is an “implementation detail,” with the recommendation being to directly create a constraint.
The difference between unique constraints and unique indexes causes quite a bit of confusion. My personal suspicion is that while most developers understand indexes, constraints are often enforced in the code base rather than in the database (through a Rails
:validates_uniqueness_of or other methods). Often times these code level constraints can’t actually guarantee uniqueness because multiple threads may be attempting to insert into the database at the same time. Database level enforcement is preferred, but for many web developers it is not the natural tendency. Plus, most constraints don’t create indexes, which helps add to the confusion around what’s happening underneath the hood. Anecdotally, I find indexes to be more of an accessible concept (speed!), and many people err on the side of indexes rather than constraints.
But in this case, the clear preferred method is a unique constraint, with which you still get an automatic unique index. If you are creating a table from scratch, this is absolutely the way to go. If you are adding a uniqueness constraint to an existing table, you may still find yourself erring on the side of a unique index, if only because an index can be created concurrently while a constraint cannot. Should you choose a concurrent index in this case, you can add a unique constraint that depends on that index, effectively doing manually what PostgreSQL would have done automatically on a new table:
CREATE UNIQUE INDEX CONCURRENTLY examples_new_col_idx ON examples (new_col);
ALTER TABLE examples
ADD CONSTRAINT examples_unique_constraint USING INDEX examples_new_col_idx;
This method will in effect create a unique constraint while capitalizing on the ability to add an index concurrently.
In the meantime, I’ll be dropping our extra unique indexes, as both a unique index and a unique constraint are taking up extra space and extra write overhead with no extra gain. In a quick test of one million integer inserts, time to insert was 60% higher into a column with a unique constraint and a unique index, and having both a unique constraint and a unique index consumed twice as much space.
Thank you to contributors, reviewers, and editors: Rohit Parulkar, James Dura, Ann Jaskiw and Dan Eisenberg