The “Nullbuster ” workaround for the unique index in SQL
The SQL community is fairly divided on what a
“null” means for a column that is a part of a unique index key.
MySQL allows it, while some such as
SQL Server outright reject it. It’s no surprise that there are no SQL standards governing whether nulls are even comparable or not.
This misalignment, however, does not preclude the situations in which a null unique index is actually useful. We’ll take a real life use case to demonstrate why one might need to find a workaround for it.
Consider yourself to be a developer in ad-tech for an e-commerce firm building a system for users to create ad banners that can go live. Once live, banners are clickable and target a “content”, which is what people who click on it will be redirected to.
The primary users of your system are internal teams who create multiple banners for contents day in and day out. As a rule of thumb, there may be multiple banners getting created that target the same content.
There can be only one LIVE banner for a given content at any given moment. A live banner is one where both its “banner” and “ingestion” status are set to “LIVE”.
This restriction is to avoid multiple banners for the same content that may trigger an opportunity loss. However, there’s none on draft banners or other statuses.
So, in the figure above,
B4 can be added but
B5 can’t be added lest there’d be two banners (
B5) for the same content
C1 which is a problem.
How would you solve this?
As always, there are multiple approaches with trade-offs here. You could:
Option IDo a “check before set” in your API i.e before writing a new banner, check if there exists any live banner for the same content. If it does, discard it, else allow it.
Pros: Easy to implement, leaning towards optimistic locking. No schema changes or new dependencies.
Cons: Functionally incorrect. A race condition exists where two banners might be simultaneously written to at the same instant. This might be tolerable for few systems, but for us, it is a no go.
Option IIQuery a central coordination service for a lock against a banner on a content. If a new banner does not find any “locked” banners for its content, we give the lock to the new banner, else we reject it.
Pros: Functionally correct as long as the coordinator is up. The coordinator can scale independently and can also serve other lock requests.
Cons: State leaks outside of central source of truth; service changes to talk to coordinator; new dependency and infra maintenance of coordinator setup (if from scratch). This feels like an overkill for our problem.
It seems that the most non invasive and intuituve solution is to somehow, rely on database unique index. We could create a vanilla unique index on the “banner” for the the three columns meaning that no duplicate record can be inserted if all its three columns have duplicate values in an another record.
alter table banner add constraint unique_live_banner_idx UNIQUE
(content_id, banner_status, ingestion_status);
However, this is not correct. Recall, the business requirements. Apart from
LIVE, all other combinations of banners maybe allowed for a banner.
If the queries to our system were issued in the above order, our current index will fail
query V (as expected), but also
query IV, which is incorrect. Two draft banners may exist for the same content.
So, what’s the trick ?
It appears that we are close and heading in the right direction (database unique integrity is what we are after). However, we need a way to filter the unique index when the value is
There’s no standard way of doing this. So, we introduce a new nullable bitfield / bool in the entity schema and set its default value as null.
We then add it to the existing index so it now covers the new column.
alter table banner add constraint unique_live_banner_idx UNIQUE
(content_id, banner_status, ingestion_status, protected_status);
On its own,
MySQL interprets null as incomparable which means the below combination of values is now “non unique” and can be inserted.
However, there’s a problem. Scenario V also passes for free now while it shouldn’t. We never told the system to deal with a
LIVE banner specially.
This is where, the benefit of using the
protected_status comes in. Whenever, our business logic warrants us to be unique, we could simply set its value to 1 (or any non null value).
Doing this guarantees that no two duplicate banners can be inserted because they’ll now be “non unique” and fail the constraint
How and where do I set the non null value prior to create / update ?
This depends on the methodology used to communicate with the database. ORM frameworks such as the like of
Gorm for Go,
Hibernate for Java make it extremely easy to create hooks / interceptors that can modify the entity prior to upsert.
If you are not using an ORM (and I don’t blame you), then you could plug the logic in manually. The only thing you’d need to guard is that the event of multiple paths that may write to “banner” entity, all of them should go through the hook. Even a single path that doesn’t follow the logic will breach the invariant that would need to be reconciled separately.
The recommended way, therefore, is to create a dedicated well documented hook. Using this, let’s modify our program to use the above hooks.
The final patch to “fix” the unique constraint , now looks like the following.
But, I’m not using MySQL. Nulls are not unique in my database
Fair point. That includes most databases. The fix is pretty easy. Instead of
null, can you find any other “now unique” identifier for your record ?
Yes, that’s right. Use the primary key. In the default block, set the
protected_status: primary_key and only modify in the hook.
That’s a wrap.
So, that was the trick. Let’s summarise what we did.
- We identified that a unique constraint was the preferred approach here but had to be tweaked slightly.
- We introduced a new field
protected_statusin the schema that can be nullable. To avoid wastage, we chose a nullable tiny bit / boolean field.
- We then modified our index to cover the new column. In theory, if the original index exists, this “update” should not be costly.
- We then set the right default values for the
protected_statusand also create a hook to undo the “null uniqueness” if our logic needs it to be.
Let’s now think of the pros and cons of this approach. This neat trick has its place but cannot be termed as a panacea.
protected_statuscan scale to multiple indexes and columns even when the business logic is updated. One source of truth, in the hook.
- Zero race conditions and works without the need of any coordinator. If no duplicates can be tolerated, this might be an excellent option.
- If using an ORM, the implementation is trivial.
- Entity schema edits may not always be possible.
- If not using an ORM, there’s a chance of going around the hook.
- New index is not for free and occupies some space, although paltry.
Thank you for reading. Would appreciate any feedback.