How to find the source of nullable data, tracing and analyzing data flow in IntelliJ IDEA for Kotlin

Published in

Javarevisited

7 min readSep 30, 2019

If you remember, in one of my posts I was talking about null checks in terms of the contract between data producer and the data consumer. There I made a point that it is much better to check the data right away upon retrieval in order to make a decision whether or not the data provided is eligible for processing.

If it’s not — there is no point in dragging it around, as it will be dropped out when it comes to processing at some point of the workflow. Or even better approach — if there is any control over data provider, it should be requested only for the data that could be processed.

It’s just that easy.

This design decision can save you a lot of null checks and headache with NullPointerExceptions if you follow it from the beginning. But, if you are dealing with some existing code, you will need some tools to track down the origin of each value under question and check whether the null check is really necessary in each concrete case.

Luckily, we are no longer coding in notepads as our grandparents used to do. We are privileged to use some of the latest tools with a countless number of helper features. Let’s appreciate that.

As you might have guessed already, I am talking about IntelliJ IDEA. And namely about “Analyze Data Flow to Here” feature:

This analysis is executed on method argument and allows you to trace back concrete parameter to the place of its definition. Pretty cool, hah?

And that’s what we need really. After all — there is a new statement for every object in your program. Let’s find all those places where the object under question is created and find out whether we need a null check really.

To make things a little bit more interesting let’s take as an example source of Kotlin (I was working with commit number 6a808ce842d157a60f8a1ebf5864a731c37076c7 in the master branch).

Also, I am pretty sure that developers of Kotlin heard something about IntelliJ IDEA.

First thing I did after cloning the project, I counted usages of org.jetbrains.annotations.Nullable:

Picture 2: There are 2382 occurrences of @Nullable in Kotlin sources

2270 usages in production files and 2382 usages overall! That’s a lot. Let’s check whether all of them are necessary.

Maybe some of those could be replaced by introducing better contracts with data producer. Or some of them are redundant because there already was a null-check somewhere along with the flow. But since there is no rule to check data as soon as it’s retrieved — we can’t be sure whether we can operate with the value provided.

First example.

There is a static method in the org.jetbrains.kotlin.codegen.CallReceiver class defined as follows:

Snippet 1: calcExtensionReceiverType static method in CallReceiver

As we can see — there are a couple of nullable method arguments, namely extensionReceiver and callableMethod.

These nullable variables concern me for a number of reasons:

in case extensionReceiver is null the call doesn’t make any sense, the method returns immediately; so why would we drag this null value for who knows how many method calls just to figure out we can’t process the data at the topmost level of the stack?
this method not only just returns in case of extensionReceive == null, it itself returns null; to return null in a project that has like 9000 nullables and null checks you should be some kind of a rebel really; additionally, to that, someone on the receiving end would have to check for that null as well.
callableMethod just affects return type, so there could be just two methods one that accepts callableMethod and other that simply doesn’t have this parameter at all; why would we drag this empty glass for no reason?

But fist things first, let’s trace down the source of this extensionReceiver parameter:

Picture 3: Data flow for `extensionReceiver`

Data flow for the value shows us that this parameter is originally received by the public static method for which it is marked as optional as well. Unfortunately, it was as far as I could get with this data analysis tool. I was not able to see all the consumers of this public static method inside the project.

But in any case the question arises: wouldn’t it be better to introduce another public method without the parameter that could be null?

But what is even more interesting — is the method itself:

Snippet 2: generateCallReceiver method

There is already a null check for extensionReceiverParameter! So the check-in method calcExtensionReceiverType is redundant. And this null return value we noticed from the beginning (and that is complete nonsense) is never used, which is a good thing. Bad thing is, it’s probably covered with some test for the project to have good condition coverage.

What about callableMethod parameter — it also originates as an input to this public method. If we check all the usages of this parameter in the method generateCallReceiver we’ll see the following:

Snippet 3: callableMethod parameter in generateCallReceiver method

So, as we can see this parameter is used in the following methods: calcExtensionReceiverType, calcDispatchReceiverType and isLocalFunCall. Well, we started from calcExtensionReceiverType, but let’s check again how it is used there:

Snippet 4: Return statement based on callableMethod value

Here is a snippet from calcDispatchReceiverType:

Snippet 5: null check for callableMethod parameter

And finally from isLocalFunCall:

Snippet 6: Another null check for callableMethod parameter

Alright, I can see a pattern here.

First of all, every method starts with a null check.

Second of all, every method needs some specific info from this callableMethod parameter:

ExtensionReceiverType and ExtensionReceiverKotlinType in the first case;
DispatchReceiverType and DispatchReceiverKotlinType in the second;
GenerateCalleeType in the third.

So, apart from the possibility of being null itself, callableMethod the object is expected to provide different information based on the use case. And I really doubt that it can contain both ExtensionReceiverType and DispatchReceiverType for instance at the same time. That to me seems like a bad API design.

The called of this public method generateCallReceiver probably knows whether it’s dealing with ExtensionReceiver, DispatchReceiver or GenerateCalle; or that it has no callable method altogether. So all those four seem like four different cases that should be handled separately for clearer code.

Here, we’ve just saved Kotlin guys like seven nullable annotations and null checks. 👏

The second example, the simpler one.

We’ll mostly use the data flow analysis tool output and will not dive into code itself so much in this example.

Let’s take a look now at the private constructor of org.jetbrains.kotlin.resolve.calls.util.ExpressionValueArgument, which is used by three public static methods to construct this object:

Snippet 7: Private constructor for ExpressionValueArgument class

The only nullable here is the parameter expression. Let’s track all possible data flows for it:

Picture 4: Data flow for nullable expression parameter

Whoa, this seems like a lot. But worry not, check all roots first. We can easily skip the last three, as the expression parameter is marked @NotNull there. So there was already a null check enforced before. So it’s only these top two we need to check on possible nullability of the value.

Both these two origins deal with a list of KtExpressions. And it’s the makeValueArgument call that deals with the concrete values from the list.

Picture 5: Data flow case for nullable expression parameter

Just from this window already we can see that the result of makeValueArgument is added to another list. So my assumption is that this whole method chain could have been one operation on a stream of KtExpressions.

But even if it’s not, an element of the list is considered to be nullable in this flow. This already seems like a code smell to me. I frankly don’t remember any data provider I had to deal with that would supply a list of elements mixed with nulls, so during processing, I would need to check every single one of them not to be null.

But even if we assume that there is such data provider, wouldn’t it be better to filter out all nulls once we received this data set and then just proceed with the flow that doesn’t require null checks?

That’s erotesis.

Frankly, I was very surprised by the approach was taken by Kotlin developers. Famous Kotlin’s null safety aims to eliminate the danger of null references from code.

But the code I’ve seen so far in Kotlin’s sources is more like paranoic null checks all over the place. Even if one was performed before, even if it’s not possible to have a null value at this point.

There is no approach and no idea how to make code they write null safe, only a postulate to check everything everywhere. For a couple of times. I guess you can’t be to cautious about null checks, right?

Even more surprising was to see constructions like:

Snippet 8: Return null expression in Kotlin sources

This makes the code somewhat of a mixture of paranoid null checks and careless return null statements. This just doesn’t seem right.

As a programmer, you are in control of your application flow. And even if you haven’t done it from the beginning — there are some great tools to trace your data flow.

And all you need from there is just a systematic approach to stick with.

Because I believe that there is a place for a null check in Java. And this place is there consumer meets producer.

How to find the source of nullable data, tracing and analyzing data flow in IntelliJ IDEA for Kotlin

First example.

The second example, the simpler one.

Written by Aleksandr Erokhin