To Null, or not to Null?!?
I’ll start this post with a question:
Who is responsible for a NullReferenceException being thrown?
Is it the dev who forgot to check for null, or is it the dev that created the null in the first place?
In this post, I’d like to challenge the use of
null to model the absence of data and our general acceptance of this behaviour, while also presenting some alternative approaches that should help reduce both the need to check for null, and the possibility of null reference exceptions.
This post forms part of the 2019 C# Advent, so check it out for another 49 articles of C# goodness.
“This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.”
Given that the inventor of null thinks it was a bad idea, you would expect that the rest of us might have figured out an alternative to code like this:
This is not real code, but is based on real code that I’ve seen many times, and I expect you have too!
So, what’s the issue? Well, let’s ignore the fact that it has more than one responsibility and there are better ways to check for null … my main bugbear is that there are paths through this code that result in a null being returned to the caller.
This is something that has been grating on me for some time now, as every null is another exception waiting to happen. Over recent months I have come to the conclusion that this is lies somewhere between a poor design decision and just a little bit lazy!
Because it casually pushes work and responsibility on to each and every one of our consumers, opening the door to the dreaded NullReferenceException, and in order to avoid said exception each of our callers needs to test the result for null.
No hardship right; it’s just a one-liner to check for null, or a ? here and there!
Sure, but it assumes that everyone calling our code actually knows to check for null; what about the juniors on the team? And we need to remember to write that code to check for null; what about the seniors on the team ;-)
In some scenarios we might get a warning indicating a possible null reference, and maybe even a helpful suggestion on how to fix it, but nothing is physically stopping us from building our app. Nothing is stopping us from deploying it to Production and then finding the issue at run time.
You might be reading this and thinking ‘it’s ok, our test suite would pick that up!” — great! But tests don’t solve all of our problems!
The problems that Null creates
- Problem 1: Returning null forces work on to our consumer(s)
Not only that, but it does so covertly!
Let’s look at some code through the eyes of a potential consumer of a theoretical Accounts class:
public User LookupUserByEmail(string email) //calling code:
User result = accounts.LookupUserByEmail(“firstname.lastname@example.org”);
What can we infer from this?
Its reasonable to assume that given we supply an email address then we will get a User object back, but we can’t actually learn anything else without further investigation. In order to figure out what happens when a user isn’t found, or when something fails, we need to actually look at the implementation and read the code.
- Problem 2: Reading code takes time and delays us from our goal!
Then upon reading the code, we uncover a lie. A dirty, dirty lie!
When a user with a matching email address can’t be found, it doesn’t return the promised User object at all. It returns a null. A dirty, dirty null!
- Problem 3: The method signature is dishonest
Our method signature gives no indication that there is a possibility of us not getting a User object back, nothing explicit to flag the potential error scenario that we need to deal with, and no warnings about nulls.
This is less than ideal, and unfortunately because it requires little thought or effort, it’s easy to fall into this trap and adopt this default behaviour, reusing this pattern time and time again: littering our code bases with land mines, waiting to blow up in our faces.
But what alternatives do we have? What can we do to solve these issues?
Don’t Return Null!
There are actually a number of techniques we can use as safe alternatives to null, which remove the possibility of null reference exceptions while still modelling the absence of data, and also keeping our functions honest.
For starters, we could return a safe default value such as an empty collection. For example:
Here we handle 3 different scenarios, or routes, through our code:
- The happy path: Everything works. We find 1 or more matching accounts and return them to the caller.
- The Less happy, but not exceptional or unanticipated path: No accounts match our criteria, so we return an empty list.
- Error Path: Something hits the fan. We log the detail to aid diagnostics, then return an empty list.
In each of these scenarios, we return an object that meets the promises stated in the method signature. We keep our function honest, as it always returns a list of account objects, even in the event of failure.
We remove the need to check that the response is not null, so we are not forcing work on to our consumer, and our code is less prone to error: if our caller fails to check for null, we don’t give rise to unwanted NullReferecnceExceptions.
Here’s the calling code that executes the example above:
An alternative to returning an empty list, which is quite specific, is to use the Null Object Pattern, returning an object derived from the return type expressed in the signature that carries zero side effects when it’s members are accessed.
Consider the following example where we upgrade an existing insurance policy to a newer version:
Here we lookup the implementation required to upgrade the Policy from version x to y, and then we use the result of that lookup to actually upgrade the Policy object, returning the updated Policy to the caller.
Notice that there is no guard clause around the
policyUpgrader object to ensure that we found a matching implementation to perform the upgrade. So how do we sleep at night?
Here is the implementation of FindPolicyUpgrader:
In the event that we find a match for our query, then we obviously return that matching implementation. However, when no match is found, instead of returning a null we return an instance of an object that conforms to the contract in our method signature; in this case it implements the interface we are expecting to return: IPolicyUpgrader.
The key thing about this special case implementation of our interface is that it doesn’t actually do anything when we execute it’s UpgradePolicy method…there are no consequences or side effects to executing this code, it simply returns the policy object that it is supplied, without doing anything to it.
Again, we have kept our method honest and returned what we said we would in our method signature, and we’ve not forced any extra work on to our clients with the potential of creating errors if they forget to perform that work.
While these two approaches do work for some scenarios, we obviously can’t get away with this behaviour all the time. We need to think about how our response is going to be used and what possible implications Null Objects might have once released into our program. For example, if downstream code makes a decision based on the count of items in a collection, and some of the instances in that collection are Null Objects, then we are potentially introducing defects into the codebase that could be really difficult to diagnose.
In a similar way, if we returned an empty string instead of a null, then we avoid a potential null reference exception, but we might cause strange behaviour further downstream.
Indeed — in the example above, we’ve performed the Upgrade task, but the Policy has not actually been upgraded…will that cause other errors elsewhere?
These techniques also fail to address the fact that we still need to look at the implementation of the function in order to figure out what happens when we deviate from the happy path, as their method signatures are not expressive enough to communicate that level of detail.
What tools do we have?
If you are lucky enough to be using C# 8, then you can benefit from a new language feature called non-nullable reference types, which allows us to tell the compiler and anyone reading our code: this object is not supposed to be null!
Its something we need to opt into at a project level, and once enabled it magically makes all the existing reference types in that project non-nullable by default (unless suffixed with a ?, in the same way you define a nullable value type). This means that instances of these objects must be initialised to a non-null value, and the variable can never be assigned the value
Here’s an excerpt from the docs:
The compiler uses static analysis to determine if a nullable reference is known to be non-null. The compiler warns you if you dereference a nullable reference when it may be null
Sounds good, right? That should prevent some null reference exceptions! For example, while this code was perfectly fine before, now we get a compiler warning against each of these assignments:
Policy policy = null;
List<IPolicyUpgrader> policyUpgraders = null;
string name = null; // [CS8600] Converting null literal or possible null value to non-nullable type.
Now the game has changed again, as these assignments now fail the build. This is obviously much better than a warning, as it is physically stopping us from initialising our objects with a null.
It’s not a quick fix though; opting for these settings on even a small project may suddenly create quite a bit of work in order to get it compiling again. So what do we get for our efforts? How do non-nullable reference types address the 3 problems discussed above?
- Returning null forces work on to our consumer: The compiler is now preventing us from returning null at design time, so that solves that issue.
- The method signature is dishonest: We are now forced to return something that matches the signature of our method, so that’s not a problem anymore.
- Reading code takes time and delays us from our goal: Let’s look at our original example again…
public User LookupUserByEmail(string email)
This looks like exactly the same code that we saw before, but now that our reference types are all non-nullable we know a little bit more information: Given an email address that should not be null, it returns a User object, which also should not be null. That’s a little more expressive than it was.
But we still can’t infer what happens when a match for our requested user can’t be found, so we still need to look at the implementation and read the code to figure this out.
There is also the slightly disappointing use of the word should. While non-nullable reference types prevent us from initialising our objects and returning null from our functions at design time, it does not actually prevent them from containing null at run time.
For example, the following test will fail even though
Policy is non-nullable, because the value of
So while this means we can’t delete all of our null checking code just yet, it’s still well worth turning on this feature in order to reduce the chances of a null reference exception, and express our intent a teeny wee bit more.
Anything else we could Try?
We could use the TryGet pattern: whereby our function returns a tuple of
(bool, T) with the boolean flag indicating if the function was successful or not, and
T will either contain a result or a null. Let’s have a look at how this might look if we refactor our earlier example:
What does this give us? Our method name has changed very subtly to indicate that this may or may not succeed with the inclusion of the word Try. This is slightly more expressive to us humans, as it provides a small hint of the possibility of failure, but it means nothing to the compiler, and therefore does nothing to force a change in the consumer’s behaviour.
The return type does go a little bit further though: as we are now returning a tuple (a value type: aka a struct) we now know that we can never receive a null response, and we have made our code less prone to error in that regard.
Our consumers can now check the boolean success flag to determine if it is safe or not to access the result. This is, however, a check that the consumer needs to opt into: they can choose to skip the safety check and access the result anyway, which may or may not be null:
var (_, policyUpgrader) = TryFindUpdater(1, 2);policyUpgrader.UpgradePolicy(policy);
The TryGet pattern is a step in the right direction, but we can go a little further.
We could throw!
I’ve been avoiding the elephant in the room. We could obviously avoid returning null by throwing an Exception instead. But this is a post about reducing null reference exceptions, so it seems a little odd to replace one type of exception with another. That feels a little like massaging the numbers to me!
Throwing Exceptions also has the side affect of making our methods dishonest and forces work onto our consumers in a similar way to returning null, so this isn’t solving all the issues we set out to fix.
For the purposes of this discussion I will continue to avoid throwing exceptions.
What more can we do?
If we look to functional programming languages for a little inspiration we can actually lean on the compiler a lot more. In languages such as F# and Haskell it’s impossible to return null from a function, because there is no null. It simply doesn’t exist!
Instead, F# uses a built-in type of
Option<T> (called ‘Maybe’ In Haskell), to represent the fact that an instance may, or may not, contain an underlaying value of
- When we have a value to return, then we return that value wrapped inside a
- When we have no value to return, then we return
Nonesatisfy the compiler’s need for type safety.
- And crucially,
Option<T>can never be set it to null, just as a struct can never be null.
With the ground rules established, let’s have a look at those method signatures from earlier and see how they might look if we could use something like
Option<T> in C#, while also reflecting on those initial problems with null;
public Option<IPolicyUpgrader> FindPolicyUpgrader(int from, int to)public Option<User> LookupUserByEmail(string email)
Is our signature honest?
Yes. Our signatures have changed to explicitly show that our response may or may not return an instance of T, and whether it succeeds or fails we always get something back that we were expecting, never a null.
Do we need to inspect the method’s implementation?
As the method signatures are a lot more expressive, we actually have a much better idea of what to expect the method to return, both on and off the happy path. We don’t really need to read the implementation to see what the method does in failure scenarios, as we can infer from the response type what we will get back if no match is found.
Are we forcing work on to our consumers?
Option<T> can’t be null, so we’ve removed the need for a null check and the possibility of a null reference exception.
What the caller needs to do now is decide what to do with
Option<T>. Does it contain a value or not, and what to do if its empty? These are decisions that they should have been making anyway, only now they are being forced by the compiler (and aided by the intellisense in the IDE) to consider both possibilities, Some and None, rather than assuming that all was well and forgetting about that pesky null!
Cool. We’ve solved all the problems created by null, but the solution is to use a type from another programming language…so where does that leave us?
Have no fear, these types DO exist in C# and are just a nuget package away: both LaYumba and LanguageExt packages contain good implementations of
Option<T> that we can use to achieve the behaviour we’ve just discussed, have excellent documentation explaining how to use them, and are based on structs so they can never be null.
What about crossing system boundaries?
This is actually a question I’m trying to answer for myself right now. Most of the code I write exists within web services: they process incoming json payloads, interact with a database or another web service, and respond with more json. How do we model the absence of data when talking to our collaborators at the edges of our system?
Option<T> does not serialise to json straight out of the box, so does not work smoothly with model binding when a controller receives a request, or when a controller responds to a request with an object of
Option<T>, or an object with properties of
Likewise, the out of the box behaviour does not serialise when saving a document (an Object) to a Mongo Database. If I wanted to save something to SQL Server, then I assume I would need to model the field as as nullable, using the appropriate datatype, and then write either the underlying value or a null to the record. I assume I need to do a similar thing with Mongo.
So for now I am using
Option<T> inside my services, mapping back and forth into nullable types at the boundaries when talking to other systems.
The aim of this post was to try to get you to question the use of null as a way of modelling the absence of data, as well as arming you with some tools and techniques to help reduce null reference exceptions.
I’m not suggesting that any of the options presented here are the answer to all scenarios. You may wish to use a combination of these techniques in order to avoid and reduce null reference exceptions.
Either way, I don’t think we’ve seen the last of null, but hopefully we’ll see it a little less going forward.
Now back to where it all began:
Who’s responsible for that NullReferenceException?