A small leak will sink a great ship

Prior to Android Lollipop, alert dialogs may cause memory leaks in your Android apps.

Square Engineering
Square Corner Blog
8 min readAug 20, 2015

--

Written by Pierre-Yves Ricau.

Heads up, we’ve moved! If you’d like to continue keeping up with the latest technical content from Square please visit us at our new home https://developer.squareup.com/blog

This post started as an internal email thread when I was building LeakCanary. I found a strange memory leak and started digging in order to figure out what was happening.

TL;DR: Prior to Android Lollipop, alert dialogs may cause memory leaks in your Android apps.

The Artist

I was getting memory leak reports from LeakCanary:

In plain words: a Picasso thread was holding on to a Message instance as a local variable on the stack. That Message had a reference to a DialogInterface.OnClickListener, which itself referenced a destroyed Activity.

Local variables are usually short lived since they only exist on the stack. When a method is called on a thread, a stack frame is allocated. When the method returns, that stack frame is cleared and all of its local variables are garbage collected. If a local variable is causing a leak then it normally means that a thread is either looping or blocking and keeping a reference to a Message instance while doing so.

Dimitris and I looked at the Picasso source code.

Dispatcher.DispatcherThread is a simple HandlerThread:

This thread receives messages through a Handler implemented in a very standard way:

This was a dead end. There was no obvious bug in Dispatcher.DispatcherHandler.handleMessage() that would somehow keep a reference to a Message through a local variable.

Queue Tips

Eventually, more memory leak reports came in. It wasn’t just Picasso. We would get local variable leaks from various types of thread, and there was always a dialog click listener involved. The leaking threads shared one common characteristic: they were worker threads and received work to do through some kind of blocking queue.

Let’s look at how HandlerThread works:

There’s definitely a local variable referencing a Message. However it should be very short lived and cleared as soon as the loop iterates.

We tried to reproduce by writing a bare bones worker thread with a blocking queue and sending it only one message:

Once the message was printed in the log, we expected the MyMessage instance to be garbage collected.

LeakCanary disagreed:

As soon as we sent a new message to the queue, the previous message was garbage collected, and that new message was now leaking.

In the VM, each stack frame has a set of local variables. The garbage collector is conservative: if there is a reference that might be alive, it won’t collect it.

After an iteration, the local variable is no longer reachable, however it still holds a reference to the message. The interpreter/JIT could manually null­ out the reference as soon as it is no longer reachable, but instead it just keeps the reference alive and assumes no damage will be done.

To confirm this theory, we manually set the reference to null and printed it again so that the null wouldn’t be optimized away:

When testing the above change, we saw that the MyMessage instance was garbage collected immediately after message was set to null. Our theory about the VM overlooking the local message variable seemed to be borne out.

Since this leak could be reproduced on various thread and queue implementations, we were now sure that it was a VM bug. On top of that, we could only reproduce it on a Dalvik VM, not on an ART VM or a JVM.

Message in a (recycled) bottle

We found a bug, but can it create huge memory leaks? Let’s look at our original leak again:

In the messages we send to Picasso dispatcher thread, we never set Message.objto a DialogInterface.OnClickListener. How did it end up there?

Furthermore, after the message is handled, it is immediately recycled and Message.obj is set to null. Only then does HandlerThread wait for the next message and temporarily leak that previous message:

As that point, we know that the leaking message has been recycled, and therefore doesn’t hold on to its prior content.

Once recycled, the message goes back in a static pool:

We have a leaking empty Message that might be reused and filled with different content. A Message is always used the same way: detached from the pool, filled with content, put on a MessageQueue, then handled, finally recycled and put back in the pool.

It should therefore never keep its content for long. Why do we always end up leaking DialogInterface.OnClickListener instances?

Alert Dialogs

Let’s create a simple alert dialog:

Notice that the click listener has a reference to the activity. The anonymous class gets translated to the following code:

Internally, AlertDialog delegates the work to AlertController:

So the OnClickListener is wrapped in a Message and set to AlertController.mButtonPositiveMessage. Let’s look at when that Message is used:

Note this: m = Message.obtain(mButtonPositiveMessage).

The message is cloned, and its copy is sent. This means that the original Message is never sent, and therefore never recycled. So it keeps its content forever, until garbage collected.

Now let’s assume that this message was already leaking prior to being obtained from the recycled pool, due to a HandlerThread local reference. The Dialog is eventually garbage collected and releases the reference to the message held by mButtonPositiveMessage.

However, since the message is already leaking, it won’t be garbage collected. Same goes for its content, the OnClickListener, and therefore the Activity.

Smoking Gun

Can we prove our theory?

We need to send a message to a HandlerThread, let it be consumed and recycled, and not send any other message to that thread so that it leaks the last message. Then, we need to show a dialog with a button and hope that this dialog will get the same message from the pool. It’s quite likely to happen, because once recycled, a message becomes first in the pool.

If we run the above code and then rotate the screen to destroy the activity, there are good chances that this activity will leak.

LeakCanary correctly detects the leak:

Now that we’ve properly reproduced it, let’s see what we can do to fix it.

The Startup Fix

Support only devices that use the ART VM, ie Android 5+. No more bugs! Also, no more users.

The Won’t Fix

You could also assume that these leaks will have a limited impact and that you have better things to do, or maybe simpler leaks to fix. LeakCanary ignores all Messageleaks by default. Beware though, an activity holds its entire view hierarchy in memory and can retain several megabytes.

The App fix

Make sure your DialogInterface.OnClickListener instances do not hold strong references to activity instances, for example by clearing the reference to the listener when the dialog window is detached. Here’s a wrapper class to make it easy:

Then you can just wrap all OnClickListener instances:

The Plumber’s Fix

Flush your worker threads on a regular basis: send an empty message when a HandlerThread becomes idle to make sure no message leaks for long.

This is useful for libraries, because you can’t control what developers are going to do with dialogs. We used it in Picasso, with a similar fix for other types of worker threads.

Conclusion

As we saw, a subtle and unexpected VM behavior can create a small leak that ends up holding on to huge chunks of the memory, eventually crashing your app with an OutOfMemoryError. A small leak will sink a great ship.

Many thanks to Josh Humphries, Jesse Wilson, Manik Surtani, and Wouter Coekaerts for their help in our internal email thread.

--

--