Building your own Java-like GC logs in .NET

This post of the series focuses on logging each GC details in a file and how to leverage it during investigations.

Part 1: Replace .NET performance counters by CLR event tracing.

Part 2: Grab ETW Session, Providers and Events.

Part 3: Monitor Finalizers, contention and threads in your application.

Part 4: Spying on .NET Garbage Collector with TraceEvent.

Introduction

I’m working in a team where we investigate issues in production: both for Java and .NET applications. This is a good opportunity to learn what are the features provided by Java that are missing in .NET. One of the features heavily discussed with my colleague Jean-Philippe is called the GC Log. It is possible to start an application with parameters that tell the GC to save tons of details about each garbage collection in a file : the GC Log. Based on this file, it is possible to extract the reason of a collection, the times of the different phases including the suspension time. This is a great source of information during investigations… when you know how GC is working or by leveraging automatic report generation.

In addition, you can also build your own UI to more easily understand what is going on and get a more visual representation of the situation.

In the short video above you can see the heap evolution during several days. Then, as this is an interactive HTML page you can zoom in an interesting period to have a more detailed view of the evolution between GCs.

Also for the pause time graph, you can follow the behavior of the GC with different kinds of pauses and associated phases. In this example, we have minor GCs happening and then an “initial mark” is triggered, followed by “final remark” and “cleanup” pauses. After an extra minor GC, we have a series of mixed GCs that is the result of what was planned by the GC after the marking phase.

In the .NET world, there is no such thing as a GC Log. However, as shown in the previous post, it is possible to use Perfview to analyze traces corresponding to collected CLR events. The GCStats view shows high level details in the “All GC Events” section. In addition to this HTML rendering, you can get access to the data itself in different formats

The more complete one is the Raw Data XML file that you could parse to extract the details you need. This is very close to a .NET GC Log but it is complicated to build an automated process to get it from a production machine.

It would be great if you could tell a .NET application to generate such a GC Log like in Java instead of relying on manual steps with Perfview (and more scripts on Linux). This post will show you how to achieve this goal!

Defining the goal: basic GcLog implementation

In Java, you have to set on or off the GC log before the application starts and you can’t change it while it runs. Since I’m working with server applications, I would prefer to enable/disable the generation of a GC log file without having to stop and restart the application.

So I’ve defined the simple IGcLog interface:

In a dedicated administration handler (i.e. http endpoint of the application), the code could just use a class that implements this interface and call Start when the log is enabled and Stop when it is no more needed.

To make the implementation easier, I’ve written the following GcLogBase abstract class:

Its main goal is to hide the file management by providing the WriteLine method that child classes would call to save the details of a garbage collection into a single line of text. The write operations are flushed when Stopis called. This combination allows asynchronous writes with low performance impact: don’t be scared if you don’t see the file size change because the StreamWriter class is caching write operations.

The next step is to implement OnStart and OnStop in a derived class to enable/disable GC details retrieval.

How to get the GC details: the easy way?

As already discussed in the previous posts of the series, the CLR is emitting traces (via ETW on Windows and LTTng on Linux) that can be collected in C#. You have already seen how TraceEvent could help collecting and parsing GC traces from any application like what Perfview is doing. With TraceEvent, the TraceGC instance received when a garbage collection ends contains tons of information: it’s mapped to the GarbageCollectionArgs structure that you get while listening to the GarbageCollection event of my ClrEventsManager helper class. The only information to provide is the ID of the .NET process I’m interested in: that way, is it easy to filter the events for this process only.

The GetUniqueFilename method builds a filename based on the process ID and the time of the day:

The GetProcessGcLog method is a factory-like helper to build an instance bound to the given process ID.

The implementation of OnStart and OnStop overrides is straightforward based on the previous post:

The created TraceEventSession is passed to the ClrEventManager with the process ID with a filter to receive only GarbageCollection event notifications. The OnGarbageCollection handler is super simple:

Each garbage collection appears as a textual line with the following columns separated by a comma:

The last twelve pieces of information require some explanation:

· xxxBefore : size of a generation before the collection; without free list

· xxxAfter : size of a generation after the collection; without free list

· xxxSize : size of a generation (including LOH) after the collection; including free list (i.e. fragmentation)

The computation of these sizes relies on inner fields of the TraceGC argument receives from TraceEvent. The xxxSize are grouped in the GenerationSize0/1/2/3 fields of the HeapStat property. It is a little bit more complicated for the Before/After sizes. The Garbage Collector keeps track of these numbers in the PerHeapHistories field: an array of GCPerHeapHistory elements; one per heap (i.e. one per core for server GC). The next level is provided by the GenData field storing an array of GCPerHeapHistoryGenData elements; one per generation with LOH as the last index 3. So, to compute the size of each generation, it is needed to iterate on each heap:

The code of the GetGenerationSizes helper method does that sum the value of either ObjSpaceBefore or ObjSizeAfter.

As you have probably noticed from the implementation, it is possible that the PerHeapHistories field is not filled up and all Before/After values are zero. This happens for a background gen2 collection. Also note that for gen0 and gen1 collection the value for gen2 and LOH is also 0 (make sense that gen2 and LOH do not change during such a collection).

A little bit of UI

Now that a .csv file containing all garbage collections details is available, it is time to provide some UI on top of it such as the following for GC pauses:

Let’s start what you can get for Excel champions:

  • Generation ratio
  • Sizes of generations including Large Object Heap
  • Top 10 pauses (including suspension time comparison)

But you can get better interaction thanks to Jean-Philippe. My colleague adapted his script for JVM to my .NET GC log .csv format: it generates some nice zoomable HTML UI.

This short video above shows the heap evolution during ~20 minutes. Then, as this is an interactive HTML page you can focus on gen2 and LOH impact on memory consumption.

For the pause time graph on the same period, it is very easy to detect long pauses (even for gen0 collection) and zoom into smaller period to figure out the impact of different collections.

The code available on Github has been updated to make the EtwGcLog class available to you.