NullReferenceException you wouldn’t expect
.NET provides many features to monitor the performance of an application. One of them is the EventListener
API from System.Diagnostics.Tracing
. While my colleague Christophe described how to use this API quite some time ago, until last week we didn’t realize it had a funny flaw that could crash any application.
To illustrate the problem, let’s look at the simplified version of a real listener we use in production. It counts how many byte buffers the application allocates in the shared ArrayPool
In the constructor we accept IMetricsRegistry
— a factory to create the application metrics. Then we override the expected EventListener
virtual handlers. The OnEventSourceCreated
method is called for each event source activated in the process to decide if we want to receive events from this source. Then in OnEventWritten
, the counter is incremented when the BufferAllocatedEvent
is received for an ArrayPool<byte>.Shared
.
What could possibly go wrong?
Here is the exception which prevented one of our applications to start:
The array was allocated in the pool, then we go through some CLR eventing internals and end up in our OnEventWritten
trying to access something which is null. What could it be?
We were quite sure it’s not the eventData
or its payload, as the listener was working fine before, and we didn’t update the .NET runtime. Shared ArrayPool
also could not be null, and GetOrRegisterCounter
also never returns null.
The only thing left was the _metricsRegistry
field itself. We double-checked that we don’t accidentally pass null to the constructor. So, the only explanation is that something calls OnEventWritten
before the constructor is finished!
When you think about it, how does the API even know that we created a listener? After the class is instantiated, you instantly start to receive events, there’s no method like Start
, or Subscribe
. The base constructor comes to mind, and a picture begins to form.
Subscription in the base constructor
It’s pretty easy to examine the source code of EventListener
: either decompiled in the IDE or with all the nice comments in the dotnet/runtime repository. Here’s the line we are interested in. Note that in C#, the base constructor is called before the derived one.
Reading the code we confirm that OnEventSourceCreated
is actually executed by the base constructor. The OnEventWritten
is a bit different as we can see in the stack trace: it’s executed on the thread which actually writes the event to the EventSource
. So, after the call to EnableEvents
in OnEventSourceCreated
the listener should be ready to receive events, even though the constructor could still be running. Our code was not prepared for that.
To fix the issue, we introduced a boolean field in the class that is false by default. The constructor will set it to true once the metric registry is set, and a check is added in OnEventWritten
:
This approach has drawbacks: the boolean should be written in every listener class, and the listener could skip some events on application startup. It would be much better if the EventListener
had a separate method to start receiving events. However, I doubt it’s easy to change that as the API is already public.
Read more from our team:
Wanna become a writer? Apply and join the team: