Weak Soft and Phantom references in Java and why they matter

Understand how memory references work in Java is essential to resolve memory related issues.

Allocation problem at Sunset restaurant in Crete

Almost every Java programmer knows there are Soft and Weak reference, but usually they are not fully understood. Phantom ones are even less known.

I think this is a bit of a shame because they are not complex to understand (comparing to Locks for example) and they can be really useful to know if you have memory problems with your application.

So I have prepared a tiny Github project to show how to use them. It is also quite interesting how the different garbage collectors are treating them, but that will be the topic for the next post.

Before going to analyse the code, let’s consider why do we need memory references at all.

The problem

One common problem of all computer languages that allow for dynamic memory allocation, is to find a way to “collect” the memory after is not used anymore.

It is a bit like in a restaurant, at the beginning you can accommodate customers to the empty tables, but when you don’t have empty tables anymore, you need to check if some of the tables already allocated have got free in the meanwhile.

Some languages, like C, leave the responsibility to users: you have got the memory and now it is your responsibility to free it. It’s a bit like in a fast food, where you are supposed to clean up your table after the meal.

This is very efficient… if everybody behaves correctly. But if some customers forget to clean up, it will easily become a problem. Same with memory: it’s very easy to forget to free an area of memory.

So Garbage Collectors (GC from here on) come to help. Some languages, namely Java, use a special algorithm to collect all the memory which is not used. Which is very nice of them and very convenient for programmers. You may be forgiven if you think that GC is a relatively recent technique.

Garbage collection was invented by John McCarthy around 1959 to simplify manual memory management in Lisp.

Modern GCs are very sofisticated programs and they use several techniques combined to quickly identify memory that can be reused. For the moment let’s assume Java GC works flawlessly and that it will free all objects which are not reachable anymore.

This anyway introduce a new problem: what if we want to keep a reference to an object but we don’t want to prevent GC to free it if there is no other reference? It’s a bit like sitting a while on a table at restaurant after having finished but be ready to leave if a new customer needs the table.

The solution

You may wonder why would I need such a thing. Actually there are a few use cases. Let’s introduce our protagonists from Java documentations:

SoftReference Soft reference objects, which are cleared at the discretion of the garbage collector in response to memory demand. Soft references are most often used to implement memory-sensitive caches. [..]
All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an
OutOfMemoryError.

WeakReference Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings. [Here Canonicalizing mappings means mapping only reachable object instances.]

PhantomReference Phantom reference objects, which are enqueued after the collector determines that their referents may otherwise be reclaimed. Phantom references are most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.[..]
Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.

So in brief: Soft references try to keep the reference. Weak references don’t try to keep the reference. Phantom references don’t free the reference until cleared.

To reuse (and stretch) one last time our restaurant metaphor: a SoftReference is like a customer that say: I’ll leave my table only when there are no other tables avalaible. A WeakReference is like someone ready to leave as soon as a new customer arrives. A PhantomReference is like someone ready to leave as soon as a new customer arrives, but actually not leaving until the manager gives him permission.

Now let’s go back to the code. 
https://github.com/uberto/memoryReferences/blob/master/src/main/java/com/gamasoft/memoryReferences/Main.java

The small program run from command line and does a very simple thing:

  1. allocate 500,000 1KB blocks in a linked list
  2. reference them using one of the 3 reference types.
  3. de-reference half of them
  4. remove unused references
  5. repeat all the above for 100 times and exit

To make things difficult for the GC, it removes from the linked list alternate elements. So that the list is always composed by elements of different “age”. Which is important for how GC works, as we will see in the next post.

Reference<HeavyList> softRef = new SoftReference<>(curr, queue); 
Reference<HeavyList> weakRef = new WeakReference<>(curr, queue);
Reference<HeavyList> phantomRef = new PhantomReference<>(curr, queue);

As you can see it is very easy to create a reference to our object (curr in this case). All reference types take the referenced object in the constructor and optionally a queue as parameter.

We can always reach the referenced object with the method get(). In case of Weak and Soft, get will return the actual object if still active, that is if it is reachable by other objects. In case the object has been collected, get() will return null.

This opens a possible problem if someone manage to “resurrect” the object using the reference get() during the finalization. For this reason Phantom always return null in the get(), regardless if the object is still active or not. In this way we can pass a PhantomReference to another object without risking that it will store a new hard reference to it.

The other parameter in the constructor is the ReferenceQueue. To understand why is important we have to consider how do we know when the referenced object is finalised.

For Soft and Weak references we can check the get() method, but it would be very time consuming if we have a big list of references. Moreover for Phantom we cannot use it at all.

For this reason, if we pass a queue in the constructor of the reference, we will get a notification when the referenced object expired. In my simple example I poll the queue after the deallocation:

private static int removeRefs(ReferenceQueue queue, Set<Reference<HeavyList>> references) {
int removed = 0;
while (true){
Reference r = queue.poll();
if (r == null)
break;
references.remove(r);
removed++;
}
return removed;
}

If queue.poll() returns null then the queue is empty. A less naive approach is to create a separate thread and call queue.remove() which will block until there is something to remove.

Just remember that whilst Weak and Soft references are put in the queue after the object is finalised, Phantom references are put in the queue before. If for any reason you don’t poll the queue, the actual objects referenced by Phantom will not be finalised and you can incur in a OutOfMemory error.

Possible uses

Well, so now that we understand memory references better, what can we use them for?

Java documentation already suggests some uses for the references.

SoftReferences can be used to implement a cache that can grow without risking to crash your application. To do this you need to implement a Map interface in which values are stored wrapped inside a SoftReference. SoftReferences will keep the objects alive until there is memory available on the heap, but it will discard them before a OutOfMemoryError.

If you are interested there is an example in Guava to study. You need to keep in mind that filling almost all of memory can slow down your program so much that a cache hardly matters. It’s easy to verify this just running the program uncommenting the line that create SoftReference.

WeakReferences can be used, for example, to store some information related to an object until the object get finalised. To do this you can implement a Map in which the keys are wrapped in a WeakReference. As soon as GC will reclaim the key object, you can remove the value as well.

Of course it can be done also using some notification mechanism, but using GC will be more robust and efficient. As example you can look at java.util.WeakHashMap but it is not thread-safe.

PhantomReferences can be used to be notified when some object got out of scope to do some cleanup of resources. Remember that the object.finalize() method is not guaranteed to be called at the end of live of an object, so if you need to close files or free resources you can rely on Phantom. Since Phantom don’t have a link to the actual object, a typical pattern is to derive your own Reference type from Phantom and adding some info useful for the final free, for example filename.

@simonebordet suggested me another use for Phantom (or Weak) references: to verify memory leaks. You can look at Jetty LeakDetector class as an example.

Playing around with this small program I also verified that WeakReference are sensibly faster than ShadowReference. In a project I am working on, I added a WeakReference to some critical resources for each request and then I added a monitor info to verify they are actually freed in a reasonable time after the request is expired.

I learned a lot writing this and I hope you this post can be useful to other people as well. Next blog post would continue the analysis of memory allocation and performance comparing Java CMS and G1 Garbage Collectors.