Published in


String Deduplication in Java

This article aims to explain and demonstrate what String Deduplication in Java is.

String Deduplication allows multiple Strings to share the same underlying character array. You can activate it as follows.

I’m using OpenJDK 13, as long as your using java version 9 or above you will be able to follow along and reproduce the results presented here.

First, let’s explain the basics of the String type. The String contains a field called value which holds the actual content (the character).

You probably know that it is probably bad practice to create new String objects instead of using something called a “String literal”.

What’s the problem with the code above? Let’s analyze it. I’m going to use VisualVM, but there are other options available. Run the following commands (I’m using a “-” sign to deactivate String Deduplication for clarity, but it is deactivated by default).

Then open up another console window to extract the heap dump (replace {PID} with your pid).

Open this dump in VisualVM, select “Objects” then “GC Roots” and navigate to the ArrayList, you will find that the ArrayList contains 4 elements as below.

The ArrayList “strings” in from VisualVM.

Notice here that the first 4 strings are different references, while the last two are the same. Further, expanding the elements, you can note that the first two strings will also point to two different byte arrays (the value field in String).

Expanded the “worst” strings.

The reason the “bad” string will contain the same value field is due to the String literal being used in the String constructor which takes it from the String pool. In this case it’s unnecessary to use the String constructor, there are some special cases where it can be useful, not discussed here.

What to do? Here is where String Deduplication comes into the picture. (Note that String Deduplication only works for the G1 garbage collector.) Run the following commands to start the application.

Then open up another console window to force a GC then extract the heap dump (replace {PID} with your pid).

The reason we do a GC first is because this is when the String Deduplication happens, which can result in slightly longer pause times. But hopefully this should make other phases more efficient as fewer objects needs to be moved around.

Now open up VisualVM and note the difference in the underlying value field of the two first strings.

Expanded the “worst” strings with String Deduplication.

Voila, they now share the same underlying value field.



Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store