Dude, Where’s My char[]?

Looking for String.value in Android M

Square Engineering
Jul 20, 2015 · 4 min read

Written by Pierre-Yves Ricau.

Heads up, we’ve moved! If you’d like to continue keeping up with the latest technical content from Square please visit us at our new home https://developer.squareup.com/blog

This article started as a thread on an internal mailing list and I thought it would also be of interest to people outside of Square.

When Android M preview 2 was released, I started receiving reports of LeakCanarycrashing when parsing heap dumps. LeakCanary reached into the char array of a String object to read a thread name, but in Android M that char array wasn’t there anymore.

Let’s dig

Here’s the structure of String.java prior to Android M:

    
value
offset
count
hash
// ...

Here’s String.java in M:

    
count
hash
// ...

Where did that char[] go? To learn more, let’s see what happens when we concatenate two strings:

String baguette  flour  love

In other words:

String baguette  flourconcatlove

String.concat() is now a native method:

    
String String string
// ...

Going native

concat() is implemented in String.cc:

 jstring (JNIEnv env, jobject java_this, jobject java_string_arg) {
ScopedFastNativeObjectAccess soa(env);
(UNLIKELY(java_string_arg )) {
ThrowNullPointerException("string arg == null");
;
}
StackHandleScope2 hs(soa.Self());
HandlemirrorString string_this(hs.NewHandle(soa.DecodemirrorString(java_this)));
HandlemirrorString string_arg(hs.NewHandle(soa.DecodemirrorString(java_string_arg)));
length_this string_thisGetLength();
length_arg string_argGetLength();
(length_arg 0 length_this 0) {
mirrorString result mirrorStringAllocFromStrings(soa.Self(), string_this, string_arg);
soa.AddLocalReferencejstring(result);
}
jobject string_original (length_this 0) java_string_arg : java_this;
jstring(string_original);
}

The actual concatenation is done in mirror::String::AllocFromStrings in mirror::String.cc:

String StringAllocFromStrings(Thread self, HandleString string, HandleString string2) {
length stringGetLength();
length2 string2GetLength();
gcAllocatorType allocator_type RuntimeCurrent()GetHeap()GetCurrentAllocator();
SetStringCountVisitor (length length2);
String new_string Alloctrue(self, length length2, allocator_type, visitor);
(UNLIKELY(new_string )) {
;
}
new_value new_stringGetValue();
memcpy(new_value, stringGetValue(), length ());
memcpy(new_value length, string2GetValue(), length2 ());
new_string;
}

First, it allocates a new string of the right size using Alloc in string-inl.h:

  kIsInstrumented,  PreFenceVisitor
String StringAlloc(Thread self, utf16_length, gcAllocatorType allocator_type,
PreFenceVisitor pre_fence_visitor) {
header_size (String);
data_size () utf16_length;
size header_size data_size;
Class string_class GetJavaLangString();
// Check for overflow and throw OutOfMemoryError if this was an unreasonable request.
(UNLIKELY(size data_size)) {
selfThrowOutOfMemoryError(StringPrintf("%s of length %d would overflow",
PrettyDescriptor(string_class).c_str(),
utf16_length).c_str());
;
}
gcHeap heap RuntimeCurrent()GetHeap();
down_castString(
heapAllocObjectWithAllocatorkIsInstrumented, true(self, string_class, size,
allocator_type, pre_fence_visitor));
}
  • This allocates an object of size header_size + data_size
  • header_size is the size of mirror::String in mirror::String.h:
  • int32_t count_
  • uint32_t hash_code_
  • uint16_t value_[0]
  • data_size is essentially the total character count times the size of one char (uint16_t).

So this means that .

Also notice the zero length array: uint16_t value_[0].

Let’s continue reading mirror::String::AllocFromStrings:

 new_value  new_stringGetValue();
memcpy(new_value, stringGetValue(), length ());
memcpy(new_value length, string2GetValue(), length2 ());

Where GetValue() is defined in mirror::String.h and returns the address of the uint16_t value_[0] that we noticed above:

 () SHARED_LOCKS_REQUIRED(Locksmutator_lock_) {
value_[0];
}

This is a quite straightforward copy from the memory address of one array to another.

Offset

You probably noticed that the offset field is now entirely gone. Java strings are immutable, so older versions of the JDK allowed substrings to share the char array of their parent, with a different offset and count. This meant holding onto a small substring could hold onto a larger string in memory and prevent it from being garbage collected.

The char array is now inlined in the String object, so substrings can’t share their parent char array, which is why offset isn’t needed anymore.

Advantages

Let’s speculate on why those changes are interesting:

  • Spatial locality of reference: instead of having to follow a reference and risking invalidating a CPU cache, the char array is available right next to the rest of the String data.
  • Smaller footprint: a Java char array contains a header to store its type and length, which was redundant.
  • Both objects had to be 4-byte aligned with padding, now there’s only one object to pad.

String is one of the most used types of the VM, so these micro optimizations will add up to huge improvements.

Conclusion: Back to Heap Dumps

Because the char[] value field was removed from String.java, it could not be parsed in heap dumps. However in Android M Preview 2 the char buffer is still serialized in the heap dump, 16 bytes after the String address (because the String structure isn’t longer than 16 bytes). This means we can get LeakCanary to work again with Android M:

Object value  fieldValuevalues "value"
ArrayInstance charArray
isCharArrayvalue
charArray ArrayInstance value

charArray ArrayInstance heapgetInstanceinstancegetId 16

This hack will eventually be fixed in Android M by inserting a virtual char[] valuefield in all String objects when dumping the heap.

Huge thanks to Chester Hsieh, Romain Guy, Jesse Wilson, and Jake Wharton for their help figuring this out.


Square Corner Blog

Buying and selling sound like simple things - and they…

Square Engineering

Written by

The official account for @Square Engineering.

Square Corner Blog

Buying and selling sound like simple things - and they should be. Somewhere along the way, they got complicated. At Square, we're working hard to make commerce easy for everyone.

Square Engineering

Written by

The official account for @Square Engineering.

Square Corner Blog

Buying and selling sound like simple things - and they should be. Somewhere along the way, they got complicated. At Square, we're working hard to make commerce easy for everyone.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store