Nerd For Tech
Published in

Nerd For Tech

The “Nullbuster ” workaround for the unique index in SQL

The SQL community is fairly divided on what a “null” means for a column that is a part of a unique index key. MySQL allows it, while some such as SQL Server outright reject it. It’s no surprise that there are no SQL standards governing whether nulls are even comparable or not.

This misalignment, however, does not preclude the situations in which a null unique index is actually useful. We’ll take a real life use case to demonstrate why one might need to find a workaround for it.

Consider yourself to be a developer in ad-tech for an e-commerce firm building a system for users to create ad banners that can go live. Once live, banners are clickable and target a “content”, which is what people who click on it will be redirected to.

The primary users of your system are internal teams who create multiple banners for contents day in and day out. As a rule of thumb, there may be multiple banners getting created that target the same content.

Entity diagram (left) shows the schema. B3 can be added as it B1 is not yet live.

Business ask

There can be only one LIVE banner for a given content at any given moment. A live banner is one where both its “banner” and “ingestion” status are set to “LIVE”.
This restriction is to avoid multiple banners for the same content that may trigger an opportunity loss. However, there’s none on draft banners or other statuses.

So, in the figure above, B4 can be added but B5 can’t be added lest there’d be two banners (B1 and B5) for the same content C1 which is a problem.

How would you solve this?

As always, there are multiple approaches with trade-offs here. You could:

  • Option I Do a “check before set” in your API i.e before writing a new banner, check if there exists any live banner for the same content. If it does, discard it, else allow it.

Pros: Easy to implement, leaning towards optimistic locking. No schema changes or new dependencies.

Cons: Functionally incorrect. A race condition exists where two banners might be simultaneously written to at the same instant. This might be tolerable for few systems, but for us, it is a no go.

  • Option II Query a central coordination service for a lock against a banner on a content. If a new banner does not find any “locked” banners for its content, we give the lock to the new banner, else we reject it.

Pros: Functionally correct as long as the coordinator is up. The coordinator can scale independently and can also serve other lock requests.

Cons: State leaks outside of central source of truth; service changes to talk to coordinator; new dependency and infra maintenance of coordinator setup (if from scratch). This feels like an overkill for our problem.

It seems that the most non invasive and intuituve solution is to somehow, rely on database unique index. We could create a vanilla unique index on the “banner” for the the three columns meaning that no duplicate record can be inserted if all its three columns have duplicate values in an another record.

alter table banner add constraint unique_live_banner_idx UNIQUE      
(content_id, banner_status, ingestion_status);
Existing index covering the three main columns
Assuming an ORM like Gorm (Go) or Hibernate (Java), we could define the index mapping as above

However, this is not correct. Recall, the business requirements. Apart from LIVE, all other combinations of banners maybe allowed for a banner.

Let’s write a simple Go program (could be anything, really) that uses Gorm (could be anything, again) to talk to MySQL to demonstrate this behaviour.

Sample test scenarios I-V. Note that scenario IV should succeed as it is in DRAFT, but V should fail.

If the queries to our system were issued in the above order, our current index will fail query V (as expected), but also query IV, which is incorrect. Two draft banners may exist for the same content.

Scenario IV is also failing as the constraint C2-DRAFT-DRAFT appears violated. Not good.
State of the system. 3/4 records inserted. Record for Scenario IV should have been here but isn’t.

So, what’s the trick ?

It appears that we are close and heading in the right direction (database unique integrity is what we are after). However, we need a way to filter the unique index when the value is LIVE.

There’s no standard way of doing this. So, we introduce a new nullable bitfield / bool in the entity schema and set its default value as null.

Notice the patch. We added a new field (and to the existing index) and set it default value as null

We then add it to the existing index so it now covers the new column.

alter table banner add constraint unique_live_banner_idx UNIQUE      
(content_id, banner_status, ingestion_status, protected_status);

On its own, MySQL interprets null as incomparable which means the below combination of values is now “non unique” and can be inserted.

Scenario IV can now pass. Notice even though everything is identical null is incomparable and hence unique
New state of the database on running through Scenarios I — V.

However, there’s a problem. Scenario V also passes for free now while it shouldn’t. We never told the system to deal with a LIVE banner specially.

Like key "C2_DRAFT_DRAFT_NULL” (scenario IV), key “C1_LIVE_LIVE_NULL” (scenario V) is non unique too

This is where, the benefit of using the protected_status comes in. Whenever, our business logic warrants us to be unique, we could simply set its value to 1 (or any non null value).

Doing this guarantees that no two duplicate banners can be inserted because they’ll now be “non unique” and fail the constraint

Since LIVE banners now carry same non null protected status, only one of them can be inserted.

How and where do I set the non null value prior to create / update ?

This depends on the methodology used to communicate with the database. ORM frameworks such as the like of Gorm for Go, Hibernate for Java make it extremely easy to create hooks / interceptors that can modify the entity prior to upsert.

The two-line “BeforeCreate” hook that fixes our use case. When we detect a live banner, we mark it protected

If you are not using an ORM (and I don’t blame you), then you could plug the logic in manually. The only thing you’d need to guard is that the event of multiple paths that may write to “banner” entity, all of them should go through the hook. Even a single path that doesn’t follow the logic will breach the invariant that would need to be reconciled separately.

The recommended way, therefore, is to create a dedicated well documented hook. Using this, let’s modify our program to use the above hooks.

The final patch to “fix” the unique constraint , now looks like the following.

But, I’m not using MySQL. Nulls are not unique in my database

Fair point. That includes most databases. The fix is pretty easy. Instead of null, can you find any other “now unique” identifier for your record ?

Yes, that’s right. Use the primary key. In the default block, set the protected_status: primary_key and only modify in the hook.

That’s a wrap.

So, that was the trick. Let’s summarise what we did.

  • We identified that a unique constraint was the preferred approach here but had to be tweaked slightly.
  • We introduced a new field protected_status in the schema that can be nullable. To avoid wastage, we chose a nullable tiny bit / boolean field.
  • We then modified our index to cover the new column. In theory, if the original index exists, this “update” should not be costly.
  • We then set the right default values for the protected_status and also create a hook to undo the “null uniqueness” if our logic needs it to be.

Let’s now think of the pros and cons of this approach. This neat trick has its place but cannot be termed as a panacea.

Pros:

  • Field protected_status can scale to multiple indexes and columns even when the business logic is updated. One source of truth, in the hook.
  • Zero race conditions and works without the need of any coordinator. If no duplicates can be tolerated, this might be an excellent option.
  • If using an ORM, the implementation is trivial.

Cons:

  • Entity schema edits may not always be possible.
  • If not using an ORM, there’s a chance of going around the hook.
  • New index is not for free and occupies some space, although paltry.

Thank you for reading. Would appreciate any feedback.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store