Dude, Where’s My char[]?

Looking for String.value in Android M

Square Engineering
Square Corner Blog
4 min readJul 20, 2015

--

Written by Pierre-Yves Ricau.

Heads up, we’ve moved! If you’d like to continue keeping up with the latest technical content from Square please visit us at our new home https://developer.squareup.com/blog

This article started as a thread on an internal mailing list and I thought it would also be of interest to people outside of Square.

When Android M preview 2 was released, I started receiving reports of LeakCanarycrashing when parsing heap dumps. LeakCanary reached into the char array of a String object to read a thread name, but in Android M that char array wasn’t there anymore.

Let’s dig

Here’s the structure of String.java prior to Android M:

Here’s String.java in M:

Where did that char[] go? To learn more, let’s see what happens when we concatenate two strings:

In other words:

String.concat() is now a native method:

Going native

concat() is implemented in String.cc:

The actual concatenation is done in mirror::String::AllocFromStrings in mirror::String.cc:

First, it allocates a new string of the right size using Alloc in string-inl.h:

  • This allocates an object of size header_size + data_size
  • header_size is the size of mirror::String in mirror::String.h:
  • int32_t count_
  • uint32_t hash_code_
  • uint16_t value_[0]
  • data_size is essentially the total character count times the size of one char (uint16_t).

So this means that the char array is inlined in the String object.

Also notice the zero length array: uint16_t value_[0].

Let’s continue reading mirror::String::AllocFromStrings:

Where GetValue() is defined in mirror::String.h and returns the address of the uint16_t value_[0] that we noticed above:

This is a quite straightforward copy from the memory address of one array to another.

Offset

You probably noticed that the offset field is now entirely gone. Java strings are immutable, so older versions of the JDK allowed substrings to share the char array of their parent, with a different offset and count. This meant holding onto a small substring could hold onto a larger string in memory and prevent it from being garbage collected.

The char array is now inlined in the String object, so substrings can’t share their parent char array, which is why offset isn’t needed anymore.

Advantages

Let’s speculate on why those changes are interesting:

  • Spatial locality of reference: instead of having to follow a reference and risking invalidating a CPU cache, the char array is available right next to the rest of the String data.
  • Smaller footprint: a Java char array contains a header to store its type and length, which was redundant.
  • Both objects had to be 4-byte aligned with padding, now there’s only one object to pad.

String is one of the most used types of the VM, so these micro optimizations will add up to huge improvements.

Conclusion: Back to Heap Dumps

Because the char[] value field was removed from String.java, it could not be parsed in heap dumps. However in Android M Preview 2 the char buffer is still serialized in the heap dump, 16 bytes after the String address (because the String structure isn’t longer than 16 bytes). This means we can get LeakCanary to work again with Android M:

This hack will eventually be fixed in Android M by inserting a virtual char[] valuefield in all String objects when dumping the heap.

Huge thanks to Chester Hsieh, Romain Guy, Jesse Wilson, and Jake Wharton for their help figuring this out.

--

--

Square Corner Blog
Square Corner Blog

Published in Square Corner Blog

Buying and selling sound like simple things - and they should be. Somewhere along the way, they got complicated. At Square, we're working hard to make commerce easy for everyone.

Square Engineering
Square Engineering

Written by Square Engineering

The official account for @Square Engineering.

No responses yet