Of NULLs and database purists
Everyone in tech must have worked with a NULL-phobic from the database team. I am currently working in a team that has data architecture guidelines set forth by one of those. The guidelines say something to the tune of, DO NOT, I repeat, D-O N-O-T make columns NULLable.
What are the arguments for loathing NULLs in databases?
A NULL is best described as untyped and unknown value, that cannot even be equated with another NULL. So we cannot apply predicates like this in a SQL WHERE clause: NULL = NULL, or NULL <> NULL, or 0 = NULL. A NULLable column might indicate or lead to some of the flaws described below.
Need for further normalization
Consider this table.
The last two columns have to be NULLable, because CustomerId 112 and potentially many other customers haven’t specified contact information. On hindsight, ContactMethod and Contact columns are candidates for further normalization. Normalizing will alleviate the need for NULLs. A record will be added to ContactInfo child table only if there is any contact information available.
In this case allowing NULLs in Customer table exposes a design flaw known as insert anomaly. Making the last two columns NULLable was just a quick-fix solution to cover improper normalization.
Burden of 3-value logic
Application developers usually write code around two logical values: TRUE or FALSE. All control structures in programming languages — if-else, while, for, switch — works based on this 2-value logic. But when they have to write database code, they have to account for an extra value, NULL. The burden of 3-value logic also increases the potential for bugs.
Some databases treat empty strings as NULLs. I know Oracle operated this way at least in 9i and 10g. Some other databases and applications treat empty strings as just empty strings (represented by ‘’), and NULLs as, well just NULLs or unknown values. I have known some ETL tools behaving this way. This is a burden to application development. It creates dependency situations in application migration.
In schema-less databases like MongoDB, when a query with NULL-matching is issued, there are two outcomes. Records having NULL values for the given attribute will be returned. In addition to that records that do not have that attribute at all will be returned. Apps have to be cautious against this behavior.
What are the solutions to fight NULLs?
The popular solution for driving NULLs out of database is to use a default value, and enforce it using a CHECK constraint or using ETL.
My unknown birth date value will look like 1900–01–01. Or 2999–01–01. Now applications have to remember to code around 1900–01–01. This isn’t making the situation any better. In fact this is making it worse. If it was just NULL any newbie in team knows that it should be handled. Now someone has to tell the newbie, “hey we have a funky date in the database that you should know about before writing a query”.
When we become quixotic about theoretical aspects, really all we gain is the dark satisfaction of being a purist (can we call such folks sadists?). It only burdens the rest of the world that depends on the database with imaginary dates like 1900–01–01.
Are there other solutions?
One solution is to be a moderate and not a purist. Strike a balance.
The other solution, if appropriate, is when you have a specific domain of values for an attribute, you might add them to a value-classification or generic-value table in the administrative subject area, and give a foreign key reference to the table under consideration. One of the values in the domain of values will indicate Not applicable or Unknown or Undefined to represent what would otherwise be NULL.
Long live NULL!