PostgreSQL: Foreign keys with condition ON UPDATE CASCADE
Foreign keys are a key feature in Relational Databases, ensuring integrity and coherence of data. They allow doing transactions ON CASCADE, which means that changes on the primary key/unique constraint they reference is also applied. This has many advantages as the complexity of the database grows.
However, there might be cases when using ON CASCADE is risky because you can lose the track of what’s actually being changed (specially when deleting). So in general is a good practice for updates, but one must be careful to use it in some cases, especially for deletes. Maybe we created a demo account and we don’t want to allow non-expert users to delete the user account, due to the lose of all relative data.
In this post we are going to compare different alternatives to the ON CASCADE constraint and their performances.
Imagine we’re working in a social network app, where people can share their pictures and comments and so on. Let’s say we have a table with the users, and other with the pictures. We have chosen as primary key for the users table the email. This field will also exist in pictures table and will have a foreign key to the users table.
CREATE TABLE public.users (
user_email varchar PRIMARY KEY,
CREATE TABLE public.pictures (
id_picture serial PRIMARY KEY,
user_email varchar REFERENCES public.users(user_email) ON UPDATE CASCADE,
Let’s suppose we have already 10 users, and each user has uploaded 200K pictures. Then, one user wants to change the email (from firstname.lastname@example.org to email@example.com). Since user_email is a PRIMARY KEY being referenced, to update the email we need to do simultaneous updates, and we cannot simply do:
UPDATE users -- or pictures
SET user_email = 'firstname.lastname@example.org'
WHERE user_email = 'email@example.com'
We have several alternatives:
- Use WITH queries.
- Use an auxiliary new fake email.
- ON UPDATE CASCADE and just run the above query.
There are other ways to do the updates, which is strongly not recommended. It would be to drop the foreign key, update the tables, and then add again the foreign key. It’s a faster option, but if every time you want to change tables you drop the constraints, you might be losing consistency. Plus, you are losing the point of the foreign keys.
One might think that they’re just two tables, but imagine you have other tables with the same foreign key (like comments, reactions, connections, …). You’ll need to do the updates in every single table. Moreover, for every new table, you’ll need to add it in the updates. In the other hand, declaring ON UPDATE CASCADE in the foreign key, you can forget about it and just do the update in the users table.
But how about performance? We made some tests with different number of pictures (200K, 1M, 5M):
200K 1M 5M WITH 2.5s 18.2s 93s AUXILIARY MAIL 6.1s 31.4s 187s ON CASCADE 2.6s 16s 86s
As expected, using the auxiliary mail takes too much time, because we apply twice the update, so this option is discarded. We can say that there are not significant differences between the with queries and the ON CASCADE, but using the last one it’s simpler and it saves to adapt code every time we add new tables.
Check the code here!
By Andreu Jiménez