Beware of C# Lambda Captures!

Jean-Philippe Durot
Criteo R&D Blog
4 min readJul 26, 2021

--

Kafka is used heavily at Criteo and when a new version of the Kafka client is deployed, great care is taken to ensure that nothing bad happens. Well… sometimes, things go wrong and we learn from it. Below graph shows an example, of things not going as expected. So, let’s explore what happened and how we fixed it!

Number of Gen 0 collections per minute

The number of generation 0 garbage collections went crazy; beyond 100.000 per minute instead of the usual 40 per minutes! This GC pressure could lead to performance degradation of your application.

Based on what we know about the .NET GC behavior (watch Konrad Kokosa GC Internals series if you need more details), it should be caused by a huge number of allocations.

Looking into the code

The next step in this kind of investigations is to pinpoint the code that triggers these allocations and what types are allocated. To do so, we will first use PerfView, as it has been explained in a previous post.

In our case, 96.4% of the allocations were instances of <>c__DisplayClass8_1 created in the StartPollTask async method of the Confluent Kafka AdminClient type.

Memory allocation tick results

Even if you don’t have the source code, it is always possible to use a decompiler to get a C# view of these types and methods (I recommend using JetBrains DotPeek with the “Show compiler-generated code” setting enabled).

For demo sake, a console application has been written with a simplified version of the StartPollTask method:

The PollMessage method returns a message if any; otherwise HasMessage is false.

Here is the allocation details after profiling this demo code in Perfview:

For this 20 seconds trace, almost 430.000 allocation ticks due to <>c__DisplayClass0_0 allocations! We need to understand what on earth could be the cause.

The <StartPollTask>b__0_0 method corresponds to the lambda expression body with the while loop that the C# compiler generates:

The Kafka.<>c__DisplayClass0_0 closure class is responsible for both storing the “captured” parameter (the Result used in WriteLine) and implementing the lambda body (<StartPollTask>b__1) calling Console.WriteLine:

Generated closure class

According to Eric Lippert, the DisplayClass naming choice was a bit unfortunate: this is jargon used by the debugger team to describe a class that has special behaviours when displayed in the debugger. Obviously we do not want to display “x” as a field of an impossibly-named class when you are debugging your code; rather, you want it to look like any other local variable. There is special gear in the debugger to handle doing so for this kind of display class. It probably should have been called “ClosureClass” instead, to make it easier to read disassembly.

Fixing the lambda capture

So we just saw how two lambda usages end up to different code generated by the C# compiler. The first one don’t capture any parameter and don’t need to allocate a closure class. The second one needs a closure class to store the captured parameter (read the following stackoverflow answer for more details about the differences).

Let’s summarize where we are in the investigation: each iteration in the while loop allocates a <>c__DisplayClass0_0 closure instance to pass the result to WriteLine. Also note that if PollMessage do not return any message, the closure is still allocated even if not used. And it was our case because there was almost always no message to process…

So the next step is to avoid the capture but still be able to use the result. This is not possible to use Task.Run because the expected parameter is of type Action, that does not accept any parameter. Hopefully, Task.Factory.StartNew (which almost do the same thing as Task.Run with the correct parameters) accepts a parameter: this is exactly what has been done in StartPollTask to get the CancellationToken.

Here is the changed while loop:

Note that we don’t need to wait for the lambda to return (i.e. no await for what is returned by StartNew) so the replacement is harmless. However, if you need to call an async method (instead of Console.WriteLine) and wait for its completion (i.e. ProcessMessageAsync for example), you will need to call Unwrap() on what is returned by StartNew to be able to wait for the ProcessMessageAsync task itself to finish. Behind the scene, Task.Run is doing this for us.

Differences/similarities between Task.Factory.StartNew and Task.Run could be explained in its own article and won’t be covered here, but you can already find many resources on this topic.

Finally, let’s check our fix with Perfview:

We can see only 4 allocation ticks, and nothing related to lambda closure, we fixed it !

--

--

Jean-Philippe Durot
Criteo R&D Blog

.Net developer at Criteo, like to know how things work internally