Fixing .NET middle-age crisis with Java ReferenceQueue and Cleaner

My colleague Kevin has just described how to implement Java ReferenceQueue in C# as a follow-up to Konrad Kokosa’s article on this Java class. Among the different discussed features, one is still missing. This post will discuss how to deal with the “middle age crisis” scenario and control finalizer threading issues. I’m sure that my former Microsoft colleague Sebastien won’t be surprised by my interest in the subject.

When a class references both IDisposable instances and native resources, the usual C# pattern is to implement both IDisposable for explicit cleanup and a Finalizer to deal with developers who would have forgotten the explicit cleanup. This pattern might have a side effect when these classes are also referencing a large objects graph.

Let’s take a minute to describe how finalizers are managed by the CLR

This animation shows what happens at the end of a collection. The darkened objects are no more referenced and should be collected. B, G and H do not implement finalizers so that could be discarded. It is different for E, I and J because their classes implement a finalizer. First, a Finalization list was holding a “weak” reference to them since they were created. Then, at the end of a collection, these references are moved to the FReacheable queue and the collection ends. Later on, after the collection ends, the finalizer thread wakes up and calls the finalizer of all objects referenced by the FReacheable queue. This is the important part of the issue: it means that even though those objects weren’t referenced anymore, they couldn’t be collected nor their memory be reclaimed because the finalizer thread has not run yet. As they could not be reclaimed, they are promoted to the next generation just like other survivors. So if those objects were in generation 0, they now end up in generation 1, extending their lifetime. It’s even worse if they get promoted from generation 1 to generation 2, as the next gen 2 collection might happen only very far in the future. This artificially increases the memory consumption of the application.

To summarize, in case of business objects that hold a large references tree with also native resources, it would be great to be able to:

  1. Allow explicit cleanup resources with the IDisposable pattern
  2. Discard the managed memory when the objects are collected
  3. Automatically cleanup native resources AFTER they are collected
  4. Have control on the thread that is cleaning up native resources

Mix a Phantom with IDisposable

The requirement #3 seems impossible to fulfill: how to access to field of an object if its memory has been reclaimed? Maybe it is possible to cheat: what if these native resources usually held as IntPtr field would be copied when the object is still alive? That way, the cleanup code could be moved outside of the object itself. This is basically the PhantomReference Java idea implemented in C# by Kevin with his PhantomObjectFinalizer:

Let’s make it generic in term of native payload:

Also note that the cleaning method has been removed due to the requirement #1: the LargeObjectshould be responsible for cleaning the resources because it will also implements IDisposable. The cleaning native part will obviously be shared with the Dispose method.

The LargeObject could be rewritten to use it and the first step is group native resources in a state:

The native payload is stored in a NativeState object that also contains the _disposed IDisposablestatus. This is required to be able to know if the object has been disposed explicitly when the static Cleanup method is called. This implementation fulfills the requirement #1 even though the cleanup code is throwing an exception: we will have to see how to control it.

Introducing the Cleaner a la Java

The next step is to focus on requirements #2 and #3: how to ensure that our LargeObject memory gets reclaimed by the garbage collector but still automatically cleanup the native resources? This scenario is handled by the Cleaner class in Java mentioned reading Konrad’s article and that I have learnt to know better by discussing at length with Jean-Philippe, our team Java internals expert.

You can register an object, a state and a callback that will be called when the object is no more referenced. It is a kind of secondary finalization mechanism.

Let’s see how I would like to use it in C#:

There will be a unique Cleaner object for all LargeObject instances. Each one will register itself and its native state in its constructor by calling the Cleaner.Track method.

The Cleaner instance receives two static callbacks:

  • Cleanup: this method will be called by the cleaner after a tracked LargeObject instance has been collected. As you can see, there is no need to change its initial implementation. It was static and receives the NativeState that stores the native state of a LargeObject. Since the NativeState type is a private inner class, the implementation details does not leak from LargeObject like it was the case with Kevin’s PhantomObjectFinalizer implementation.
  • OnError: when an exception occurs during the cleanup (like my naïve implementation did by throwing an InvalidOperationException), the method gets called. This is a new feature compared to a .NET finalizer: you are notified if something goes wrong and you are able to log it. However, I would recommend to still exit the application like the default CLR behavior when a finalizer throws an exception.

The LargeObject code is therefore responsible for cleaning both IDisposableand native resources: no need for its users to know the gory details.

The high-level API of the Cleaner class has been defined; it is now time to see how to implement it. If you have read Kevin’s post, the first step should be obvious: a ReferenceQueue will keep track of the PhantomObjectFinalizer bound to each “business object” like LargeObject. When the latter is collected, the phantom finalizer gets called to enqueue itself to the ReferenceQueue.

There is one big missing step: who will call the queue Poll method to get the finalized PhantomObjectFinalizer that contains the native state to cleanup?

Stay in control of the cleaner job

The simple implementation I’ve chosen is to create a dedicated thread that will poll the queue every period you want and call the cleanup callback. I did not want to add pressure on the ThreadPool that is shared with the application. If an exception is raised, the error callback will be called.

Since I’ve created the thread as a background thread, it won’t block .NET to exit the process when the last foreground thread returns. However, you are free to follow the IDisposable pattern, and call Dispose to explicitly stop the cleaning thread at the right time of your application lifecycle.

In the IDisposable/finalizer pattern, the GC class provides the SuppressFinalize static method to remove an object when it has been explicitly disposed: that way, the object won’t go to the FReacheable queue nor be promoted into the next generation after it is collected. The Cleaner class provides the Untrack method to achieve the same effect: the object native payload won’t be cleaned up. I just had to update the ReferenceQueue to remove the object from the ConditionalWeakTable and remove the PhantomReference from the FinalizationList:

The requirement #4 is now fulfilled. You are obviously free to pick another implementation more suitable to your needs than a thread-based periodic cleanup. I would like to mention that if the cleanup callback never returns, the effect is almost the same as in the case of a stuck finalizer: the native resources won’t be cleaned up anymore.

The following code shows how all this “complicated” code does not leak in a C# application:

And you get the expected output:

Maybe Konrad will integrate a smarter Java Cleaner-like feature within the CLR itself or Alexandre in his new .NET ;^)