Using Unique Database Fields with Soft Deletes
If you’ve ever tried using soft deletes with a unique index, you might have noticed that they don’t always play well together. A user might delete a record, expecting that they can add a new one using the same unique key. Let’s look at why this is important, and what we can do to fix it!
I’ve always said, your database should be your final point of validation. If you want your data to look a certain way, you better make sure your schema reflects that. This includes column types, auto incrementing ID’s, default values, foreign key constraints, unique indexes etc.
Generally, all systems have data that requires some field to be unique across a data set. For example the id, email or username of a user. But is it really necessary to add the constraint in your database? How would you deal with it if it’s been soft deleted? Surely you can simply do a check in the software!
To me, validating database constraints in software is similar to front-end validation in web systems. Checking that the email field is a valid email on the client’s side is great for user experience, but easy to bypass. This is why we do the same check in our back-end — to make sure the user didn’t disable any front-end checks.
Running these database checks in our application allow us to make some important assumptions on what to do next, as well as providing more descriptive feedback, but they aren’t always reliable.
I’ve had many a time where a user would “double click” a button, creating two processes running at exactly the same time, both checking if a field is unique in a dataset, both asserting it to be true, and both inserting the same duplicate data - at the same time.
Or perhaps there is a bug in the software that allows un-deleting records that have already been re-added with the same unique value. What ever the case, database constraints are a great way to protect you from yourself!
If you need a column to be unique, or you need it to match up with a foreign key, make sure the database shows that! It’s much better to get an error from your database and let the entire application bomb out than to keep inserting corrupt data only to find out a year after it started.
Ok ok, I get it. Unique indexes are important, but they are impractical because of soft deletes
Assume you have a bunch of products, and because of previous sales, you don’t want to delete the products. You decide to implement soft deletes by adding a column that indicates whether the record is still active or deleted. This means that if a product has been deleted, a user should be able to add another product with the same SKU code as the deleted one. Since the column is unique, this will fail.
The solution? We currently use nullable timestamps as a soft delete flag. At first I thought “Simply create a composite index across both the unique column AND the deleted_at column”.
A composite index is an index that gets applied to multiple columns. For example, you might have a users table with both name and surname columns. A user can have the same name as another user, as long as they have different surnames. In that case you would use a composite index that requires both the name and the surname together to be unique:
ALTER TABLE users ADD UNIQUE KEY fullname (name, surname);
Unfortunately, “null” is not considered as violating the unique constraint in the majority of databases (with some exceptions). In other words, you can have two or more records with a null value in the same dataset.
My proposed solution
After realizing this, I had to go back to the drawing board. The best solution I could come up with is using a unix timestamp for the deleted_at column:
deleted_at int(11) NOT NULL DEFAULT 0. If it’s
0, it’s not deleted — any other value is the time of deletion.
The down side is that it’s a bit ambiguous, as
0 can also mean 1 January 1970. But since your application probably didn’t exist back then, and most of the records in your table have a deleted_at value of
0, it would be safe to assume that
0 means the record is not deleted.
I haven’t actually implemented this anywhere, but I’m strongly considering it. Would love to hear your thoughts, opinions and ways you’ve tried to solve this problem.
PS: Composite keys are inherently slower, and should be used with caution on large datasets (“Large” depends mostly on your database server, and could be anything between 5 million records and 500 million records). There are other (much more complex) solutions for big data sets.
- Make sure your database schema resembles the data accurately!
- Use unique indexes if data should be unique!
- If you’re using soft deletes, make the deleted column a unix timestamp with a default of
0means active — any other number is the time of deletion.
- Set up a composite unique index between the column that should be unique and the deleted column.
- This will allow you to have as many deleted records with the same duplicate data as there are seconds in the unix timestamp, but only a single un-deleted record can exist.