Debugging Wednesday at Criteo — Cancel this task!
Last Wednesday was a great day for Kevin and myself: We spent a lot of time investigating the reasons why a test was failing. Let’s share with you these frustrating but interesting minutes.
One of our colleagues came to us because an integration test would get stuck in some specific conditions. Here is the simplified code of the service that is supposed to do some background processing until it is stopped:
In the test, the service is created: Two
Task instances are created in the constructor (for background processing irrelevant for this discussion) that (1) are started when the service starts and (2) are canceled when the service is stopped. A
CancellationTokenSource is used to cancel the tasks if needed.
However, in the specific conditions of this test, the
Start method would never be called, skipping straight to the
The task is never started because the test skips the
Service.Start() method, but it should transition to “Cancelled” state as soon as the
CancellationTokenSource associated with the
CancellationToken passed to the
Task gets canceled. In that particular situation, we expect the
await _backgroundProcessing code to immediately return in the
In our colleague Visual Studio, the debugger never came back from
await _backgroundProcessing, indicating that the task never completed…
Reproduce the problem
I wanted to un-validate any side effect related to our test framework or custom Criteo libraries and confirm my understanding of task cancellation, so I wrote a small Console application with the same 4.7.2 version of .NET Framework with the following code:
And guess what?
I got the expected
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
The next step was to double-check the understanding of how tasks are working related to cancellation. So I set a breakpoint after the task is instantiated
And its status is Created.
Doing the same after
Cancel() is called on the
CancellationTokenSource, gives the expected Canceled status.
So, let’s do the same when debugging the test code:
It gives the same Created status after the task creation but after canceling the source:
…the status does not switch to Canceled like in my Console repro!
Looking for cancellation
The only thing that came to my mind was: maybe the cancellation token is not taken into account. However, a
CancellationToken is just a struct that keeps a reference to its
CancellationTokenSource. It should be easy in Visual Studio debugger to double-check that our task keeps track of the token somewhere and goes back to the cancellation source. Well… the token is not kept as a field of the task but stored inside the
m_contingentProperties field deep during the task construction code path.
Let’s look at the value in the Quick Watch after the cancellation source gets canceled:
It sounds like the cancellation token is not canceled… But if we look at the source
Token property of our canceled
we don’t have the same value for
It’s like we don’t look at the same cancellation source… To find out, we just have to compare the reference to our
CancellationTokenSource with the one we see in the
m_contingentProperties token of the task. To achieve that, we could copy the expression from QuickWatch, paste it into the Debug | Windows | Memory… pane and press ENTER to get the address where the object is stored in memory
After having done the same with
_cancellationSource, I did not get the same address:
It means that we were dealing with two different instances of
But this might be too C++ish for you… Kevin prefers leveraging the Make Object ID feature of the C# debugger. You simply right-click the Data Tip of the cancellation source and select Make Object ID:
Once this is done, a numeric identifier is displayed for this instance in Data Tip and any Watch window (#1 in this screenshot):
When we looked at the cancellation source of the token stored by the task,
we didn’t see any ID so it was not the same object.
So we decided to use Make Object ID on this
m_source that became marked as #2.
And when we looked at the
m_source of the second cleanup
Task, we realized that it was the same object but not the one we created!
We started to think that we were becoming crazy. So, let’s restart from the beginning and follow the
CancellationTokenSource from its creation because we are sure that we passed a valid cancellation token (linked to this source) to the task constructor. Or… Did we? The QuickWatch gives a different answer just after the task gets created compared to what we’ve seen already: a token with an empty source property now!
I closed the QuickWatch pane and reopened it for Kevin to confirm. And like in a nightmare, the source was not null anymore…
Visual Studio must be responsible for that weird behavior!
Kevin remembered the ”Enable property evaluation” settings in the Options dialog. If it is checked (which is the default), it means that the Debugger would fetch the value of an instance field and then call the Getter of each property in order to display its value.
However, if you uncheck it, only the fields are displayed. So in our case, we then always got a null
m_contingentProperties field (and, as expected, all property would not be displayed):
m_contingentProperties is initialized in
EnsureContingentPropertiesInitialized() when called by
AssignCancellationToken() from the
TaskContructorCore() helper used by the
Task constructor but it did not seem to be the case because it was definitively null…
Kevin decided to stop at the
CancellationTokenSource constructor with a new breakpoint (more on how to set a breakpoint on a .NET Framework method soon) to see where the one shown in the Debugger was created but the breakpoint was never hit. So the
CancellationTokenSource #2 must have been created even before our own was created by our code. In fact, a static
CancellationTokenSource is created and is set to
InitializeDefaultSource() gets called by one of the Getter. This explains why we saw the same instance #2 in both tasks token.
To sum up, we were now sure that the passed token was not “received” by the
Maybe there is a magic trick done by the .NET Framework to lazily set the token source after the creation of the task. However, we did not find such a code in the .NET Framework and this is not what we see in our repro.
Back to the basics: are we sure that we are executing the code we think is executed? We looked for mscorlib in the Debug | Windows | Modules pane,
and we opened it with a decompiler: the code of the methods called during the
Task construction was the same as the one shown in https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Task.cs.
Next, in order to better follow the execution and the passing of parameters (including our token), we decided to set breakpoints on
Task private method responsible for its initialization.
internal void TaskConstructorCore(object action, object state, CancellationToken cancellationToken, TaskCreationOptions creationOptions, InternalTaskOptions internalOptions, TaskScheduler scheduler)
In the Debug | Windows | Breakpoints pane, click New | Function Breakpoint…
and type the full name of the method. This is working even for a method of a class defined in the .NET Framework assembly for which you do not have the source code:
We checked that the breakpoints were well set (i.e. no typo in the full name) by looking at the filled red circle:
cancellationToken parameter should contain the token that we passed at
Task creation. Unfortunately, the QuickWatch pane displayed a “cannot read memory” error that we never saw in Visual Studio before!
At that time, we thought we were doomed but we looked at the Call Stack pane and we realized that the code was calling the wrong Task constructor:
public Task(Action<object> action, object state, TaskCreationOptions creationOptions)
Its signature is compatible with our code:
_backgroundProcessing = new Task(DoStuffInTheBackground, cancellationToken, TaskCreationOptions.LongRunning);
and this is why the compiler did not complain.
CancellationToken was passed as the
state parameter is given directly to our
DoStuffInTheBackground Action<object>: the created
Task had no idea that it was supposed to be its
Note that if we had noticed the Auto Completion (Ctrl + Shift + Space) hint, we might have figured out the root cause much sooner…
The fix was straightforward; just using the right constructor:
public Task(Action<object> action, object state, CancellationToken cancellationToken, TaskCreationOptions creationOptions)
that accepts both a state for the callback and a
CancellationToken for the
Task to create:
_backgroundProcessing = new Task(DoStuffInTheBackground, cancellationToken, cancellationToken, TaskCreationOptions.LongRunning);
Under the debugger, we validated that the source was now the expected one:
and the test did not hang anymore.
If, from the beginning, we would have been able to step into .NET Framework compiled code as we do with Jetbrains Resharper integration in Visual Studio, we would have found the issue almost immediately. Thankfully, Microsoft has just announced decompilation of C# code made easy with Visual Studio.
We wish we had it last Wednesday…
Interested in reading more about Christophe’s & Kevin’s work? Check out their latest articles:
Build your own .NET memory profiler in C#
This post explains how to collect allocation details by writing your own memory profiler in C#.
If you are looking for a change and would love to work with these two, head over to our careers page and let us know if there is something that sounds like you!