Null references or: How I Learned to Stop Worrying and Love Scala Options
“I call it my billion-dollar mistake. It was the invention of the null reference in 1965” — Tony Hoare, 2009
For many software developers, the null reference is something they encounter very early in their careers, and continues to haunt them for the rest of their lives. In simple terms, a null reference is when an object is declared but not assigned a value. A situation like the one below could potentially throw what’s called a
You might ask the question:
“Why can’t you just make sure all objects are assigned a value every time?”
and you might be right; for systems with a smaller number of moving parts, you can reduce null references by being careful. The reality of modern software development is that systems are enormously complex, and it is practically impossible to enforce invariants at runtime to make sure all objects have a value. One solution to this problem is to allow objects to be unassigned, or rather, anticipate that objects could be unassigned and design a system accordingly. Our computePay method, can be re-designed like so
Problem solved. The developer has triumphed over the feeble
NullPointerException and all is well; they can now rest easy, knowing that they’ll never experience a null reference ever again… Hold on, is this really the case? How about all the other methods which have objects as parameters? Wait, you’re saying there are some objects which are guaranteed to never be null in our codebase? We don’t need null checks for those, right? What do you mean we have over 500 methods, each of which consumes more than one object; you mean we have to write null checks in each of those? I really hope I didn’t forget a null check somewhere…
This is why the creator of the null reference refers to it as his “billion-dollar mistake”. It’s a problem which crosscuts most software development projects, and appears to have no clear-cut solution. Is there a better way?
What’s the problem, really?
I would argue that the idea of a null reference isn’t entirely problematic. There are problem domains which are modelled very naturally with the inclusion of null references. Let’s say we have a class modelling a high school student and the university they would attend in the future:
The student has a first and last name, but the value of the university they attend wouldn’t be something that could be known at runtime or when the object is created. After all — they’re a high school student, not a university student. It’s safe to say here that the univ field makes the most sense as not having an assigned value, maybe the null reference is useful after all. Of course, one could argue that the univ field should not belong to the HighSchoolStudent class but that’s another discussion. The problem, then, may not lie with the null reference, but with how it’s represented as data:
private University univ;
This declaration communicates no context to the developer beyond what type the univ field is, and who it is accessible by. The developer is unable to tell from the declaration whether the field is nullable and, to be safe, should introduce the boilerplate null checks to guard against that possibility. In my opinion, the language has failed here because it lacks the sufficient utilities to explicitly communicate that a value is nullable. Can we solve this problem?
Enter Scala Options
Scala is a language which runs on the JVM and offers strong support for functional programming. It has been described as being as easy to use as Java or other strongly statically typed OOP languages, but without the boilerplate. Its answer to the null reference problem is the Option type. Below is the same declaration of the
univ value in Scala using Option:
private val univ: Option[University]
By the inclusion of the
Option type, Scala enables a developer to explicitly declare a value as optional; valuable information that the compiler uses to perform compile-time checks which help prevent (but not completely remove, more on that later) an entire class of problems related to null references. But how does all this work?
Options, only Optionally Challenging
When most developers start working with
Option , they often come across literature which describes them as something called a Monad. A natural follow-up question would be to ask what a Monad is, and they might come across this description:
“A monad is just a monoid in the category of endofunctors, what’s the problem?” — James Iry, 2009
And all becomes clear. The developer completely conquers the concept of the Monad, and transcends to a level of development unreached by mere mortals.
Unfortunately, most developers do not have a degree in category theory and will find the explanation above to be less than satisfactory to say the least. This is why I’ll be taking a more practical approach to explaining
Option , one which does not require the reader to understand, or even be aware of what a Monad is, because honestly speaking — I also have no idea what it is, either.
I’ve previously stated that an
Option represents a “nullable value.” This means that the
Option must provide a way to represent something existing, and a way to represent something not existing. It does this by having two subtypes extend it. The
Some type in Scala represents an instance of
Option where a value exists, or is assigned to the optional type.
None type represents an instance of
Option where the optional type does not have a value assigned. An even simpler way of thinking about
Option is to treat it as a box that can exist in two states: it either holds an item, or it does not. This is a powerful abstraction, especially when you can draw parallels to a datatype which is already familiar, a collection.
If we treat
Option as a unary collection, it turns out that all the powerful mechanisms of abstraction that apply to collections also apply to the
Option type as well. This means that our beloved functional abstractions:
flatMap , and even
fold can be used perfectly with
Option . This enables a developer to perform transformations on the value potentially within the
Option without having to explicitly unwrap it — pretty powerful stuff.
Another key advantage of having an
Option type is the fact that it is explicit to the programmer that a type is optional and should be treated as such. No more forgetting to do null checks, and, perhaps even better — no need to perform null checks on types which are not declared as
Option . The compiler can also use these type annotations to give warnings to developers when they are working with optional types — which they should ideally heed, but more realistically will ignore. This brings us to the dark side of
Option , because no language construct is perfect.
Options — done really, really bad
I am not claiming that the
Option type will magically solve programming errors related to null references — there are definitely cases where misuse can lead a developer down a path of pain
- Optional everything
A common pattern that developers may fall into is declaring every field as an
Option type. After all — using
Option is inherently safer, right? This is not the case.
Option still provides utilities which allow developers to use it in an unsafe way (more on that later), but this notion of optional everything isn’t necessarily a problem with the language, but more so with the design of the program. There are definitely fields in the
Comment class above which should not be optional, e.g.
isPlaceholder , but are declared as such nonetheless.
A solution to the “optional everything” problem is just to not do it. In more concrete terms, a developer should think very carefully about what makes sense as an optional type, and what does not.
2. “None checks”, the new null checks
Recall our previous method which computed an employee’s pay in Java. We can write code just as bad as in Scala using
This is a misuse of the
Option type. The developer who wrote this likely did not understand that abstractions such as
flatMap are available to be called on
Option and reverted back to a less idiomatic, less functional way of using them. Here’s the same method which leverages the expressive power of
3. Using .get
I previously mentioned that the use of
Option does not entirely remove the class of problems related to null references, and this is where stuff can go very wrong. We discussed that
Option can be thought of as a unary collection. This means that it also has support for accessors. .get is a prime example of such an accessor and it is one that should never have to be used. The code on line 4 in the gist below will throw a
Again, this is a developer’s misuse of
Option enabled by the existence of the .get accessor.
So what now?
You’ve made it this far, so here’s a meme
The Option type opens up a new world of possibilities — and they’re not exclusive to Scala. The
Maybe type in Haskell is analogous to the
Option type in Scala, with
Some , and
None , while Java supports it with its
Optional<T> type. Even Python, which is apparently every developer’s favourite language, has support for optional types with its
Optional type. Hopefully this blog post has shown you something that will make programming fun again 😊
About the author
James Yoo is a 4th year computer science major at UBC and a co-op software developer on Engage.
When he is not programming, you can probably find him in one of the computer science labs at UBC as a teaching assistant, browsing Reddit more than he should be, or discovering one of Mount Pleasant’s many craft breweries.