The Case of the Null Reference

Sir Hoare’s “billion-dollar mistake” is a well known quote among software developers. It is commonly used as reference in various articles to argue against using the null type and value at all in languages that feature it.

PHP is such a language, and it is a language I am using on a daily basis — professionally at @trivago as well as personally. It is also the language that I want to have a closer look at in the context of null references. However, we need to clarify first what null references actually are before we can dig deeper.

The title of this article is a homage to Sergio Martino’s masterpiece “The Case of the Scorpion’s Tale”.


What Sir Hoare was referring to when he was talking about null references.

He was not referring to the type or value null as we can find it in many programming languages, he was specifically talking about null pointers, particularly about null object pointers.

“What is the difference between a null pointer and a dangling pointer?”
A very nice job interview question!

Dereferencing a null pointer— i.e. attempting to access the datum or data stored at that location in memory (which usually is literally 0) — usually ends up in some kind of error. What kind of error depends on the programming language, and is for instance undefined in C and leads to the infamous NullPointerException in Java.

What difference does this make? Null is null!

Jain (yes and no as we say in German), a null reference can be used like nullable or option types — more about both later — and it is used as such in many programs. This is perfectly sound and fine, and not the billion-dollar mistake! Consider the following simple example program in Java:

We have a routine f with a parameter x that is constrained to the type Integer, and returns the sum of x and 42. You surely spotted already that f is being called with null in the main routine. This will compile without any warnings or errors, however, executing the program leads to a NullPointerException at runtime because of the add operation in line 7. In Java, like in most other languages that support null references, any object despite constrains can be null at any point in time, and the type checker will not complain about it.

Equivalent code in PHP.

This is an equivalent program in PHP, and it compiles without any warnings or errors. It also leads to an error at runtime just like in Java, to be specific, it leads to an uncaught TypeError at line 4. The difference here is that in Java the runtime error occurred when we tried to call add on x, whereas in PHP the runtime error occurs in the instant we call the routine because null is not part of the annotated types for x.

Hence, in PHP it is not possible for a constrained value to be null while still passing through all type checks. You might argue now that I am creating an example in favor of PHP, since variables can always contain any value. This is part of the dynamic nature of PHP, however, we are not talking about null assignments in close proximity within the same scope.

We are talking about data from other scopes which gets passed along, and slips through type constraints. Constraints which we assume to be enforced so that we can rely on them. In Java, but also in C, and many other languages, the type system does enforce some invariants, but ignores null references. The programmer is required to check against possible null references everywhere at all times on their own.

This is the billion-dollar mistake!

That being said, we can create the same situation in PHP, consider the following example:

Inappropriate usage of references in PHP.

Executing this code will abort with a TypeError because the called toInt routine does not return an int as required by the return type constraint; note again that strict types are not enabled. No compiler warning or error was emitted, and the value in the encapsulated property changed without notice, breaking encapsulation and the class’s invariants.

This is the billion-dollar mistake!
Okay, I got it, it’s about weak type systems and mutations from afar, but isn’t null still inappropriate in return values and exceptions should be used at all times instead?

Exceptions should be used in exceptional cases, this sentence is well known but it is too vague, and fails to define when exceptions are actually appropriate. Let me rephrase it: exceptions should be used if an actual exceptional state occurred that cannot be handled by the code because it is either impossible, or unclear which action would be appropriate. A good example is code that is instructed to create a file on disk, and the creation fails due to permission issues.

The code could now attempt to gain elevated user privileges and try again, it could try to create the file elsewhere, or something else. No matter what, it is very unlikely that the chosen recovery action is the right one. This inability to perform the instructed action as well as choosing an appropriate recovery action is the perfect situation to raise an exception that can be handled by the instructor.


Now let’s consider a different example of a collection or repository routine. The code in question is instructed to retrieve the data that corresponds to the identifier 42. The value is well within legal bounds 1 ≤ x ≤ 2³² − 1 (being outside would be exceptional) but there is no data to be found for 42. Note that deletions within the collection are legal, and that we are dealing with a sparse collection.

Many will now choose to throw an exception, basically falling back to using a goto to some symbol — read try-catch block in the context of exceptions — anywhere else in the program, or — if no such symbol was defined — let the program crash. Using exceptions as well as goto for control flow is bad practice, and leads to programs that are hard to understand.

Hence, indicating the absence of something via exceptions as well as goto seems inappropriate, especially if we consider that a missing try-catch block results in the termination of the process. After all, the collection performed its duty of searching for 42 and came to a result; the result is simply nothing which nicely translates to null.

If we use null we have a nullable type ?T which is different to a null reference. Our contract clearly defines that the find routine either results in null (?) or a value of the constrained type T. Think back, in the null reference case our type would be T, and not ?T, but it would still be possible that the caller receives null, which is after all a violation of our contract.

This all sounds nice but I still have to check for null everywhere, like in the null reference case!

Actually, you only have to check if a contract clearly defines that something can be null, and not everywhere. Note further that the same is true for exceptions. The only difference is that unhandled exceptions are easier to fix later because a single symbol at the outermost point of the program can take care of all the sloppy parts of the code that did not properly handle their exceptions.

Fixing all parts to account for a nullable type, on the other hand, is much more involving, and this is probably the reason why some advocate exceptions over nullable types. Then again, are we not all sloppy, and tend to forget things? We are human, are we not? Nobody produces perfect code at all times — despite some claiming to do so — or is capable of thinking of everything, always, everywhere.

At this point, those parts of the audience that got in touch with functional programming to some extend will shout “maybe monad. And indeed I already mentioned the option types multiple times without going into any detail. An option type or maybe monad is like a checked exception in Java. This might still be unclear to the average PHP developer, let me explain the latter first to get back to the former afterwards.

Checked exceptions are exceptions which are part of a routine’s signature, consider the following hypothetical PHP code:

function f() throws Exception {}

Any code calling that routine is required to either enclose it in a try-catch block for that particular exception — or any of its superclasses — or extend its own signature to throw it — or any of its superclasses — as well. These constraints are enforced by the compiler, like type constraints in an inheritance chain.

An option type is similar in the sense that it cannot be ignored, like checked exceptions, and is part of a routine’s signature. However, it does not unroll the stack, and is not able to act like a goto. An option type instance encapsulates a value, and it allows access to that value only through its routines. Hence, anyone who wants to get a hold of that encapsulated value must go through those routines.

In other words, it is a safety net preventing us silly humans from forgetting that we are dealing with a possibly absent value. This sounds like the perfect solution to the dilemma we are facing here, plus, we can enrich the option type with other useful functionality. This is nice but it obviously adds overhead — there is no such thing as a free lunch.

The option type must be created, memory must be allocated, and we have to verify via method calls whether we received a meaningful value or not. In practice this overhead is negligible, but PHP comes with other obstacles that prevent us from having nice options: no support for generics.

In order to not lose type information we have to create a specific option class — a special case type implementation— for every possible type in the complete program. While this results in the best usability, it also results in the worst maintainability, and increases the overhead of these options tremendously. This is a problem that pops up all the time in PHP due to the simple type system, and aforementioned missing support for generics.

To null or not to null, that is the question!

This boils down to the question whether one is comfortable with using exceptions for control flow or not. Nullable types and option types are almost always the better choice. However, they are a heavy burden in PHP that should not be underestimated. This is due to the fact that right now it is possible to declare nullable types only, but not to work with them directly in a safe manner. They always require a guarding if condition that unnecessarily pollutes the code.

Attempts to improve the ergonomics of nullable types were made long before nullable types even landed in PHP — with the Nullsafe Calls RFC. Nullsafe calls combined with an intelligent type checker result in a catch all null object implementation without the overhead of option types, or actual null object implementations. On top of that they do not require support for generics to properly retain type information. In fact, there are languages that already perfected this:

This is the same example we had initially but written in Ceylon where we have to explicitly allow x to be null. The program does not compile if we are not declaring x to be nullable but call it with null, it also does not compile if we call plus directly on x without the nullsafe call operator ?. that precedes the method call. Of course, the overhead of performing such deep introspections on the source is not feasible in an interpreted language like PHP. However, static analysis tools are capable of adding that already.

Conclusio

Null references are a nasty problem. However, they are not a real world problem in PHP software since actual references are seldom used. Experienced developers do not use them because they know the side effects and beginners avoid them because they do not really know what they are doing after all.

Nullable types, or the type and value null, are very useful in programming languages and required to indicate the absence of a meaningful value. It does not matter how they are called (null, nil, ?, nothing*, …) or if they are not directly available in userland, e.g. Rust or most functional languages which make use of the optional type instead of null.

*) Nothing is most commonly used as a bottom type and not as null, the PHP equivalent here would be void.


I hope that readers of this article understand that Sir Hoare was not talking about null, and know better the next time somebody wants to convince them about the evilness of null. The null reference problem is, no questions asked, a programming language design anti-pattern. However, handling the absence of a meaningful datum is required, and null/nullable types/null objects and option types/maybe monads/special case types are valid approaches to get a hold of this specific problem.