Java Compiler Optimization for String Concatenation
String concatenation was a costly affair in the early Sun Java versions(till JDK1.4 to be precise). Even though later JDK’s brought the compiler optimization of String concatenation using StringBuilder, still String class(not just concatenation) is the most discussed topic during interviews or otherwise among Java developers.
The StringBuilder
class was introduced with JDK 1.5 and along with it, the compiler was optimized for String concatenation to use StringBuilder in place of StringBuffer behind the scenes (you can checkout the bytecode using javap -c). So concatenation of strings using + operator :
String a = b + c + d;
was converted into
String a = new StringBuilder(b).append(c).append(d).toString();
If + Operator is automatically converted into String builder why is it still around ?
Lets take an example, Consider the below class :
public class StringConcatenation
{
public static void main(String[] args)
{
String result = "";
for (int i = 0; i < 1e6; i++)
{
result += "some data";
}
System.out.println(result);
}
}
this is actually converted into this by the compilers till JDK 8 as below :
public class StringConcatenation
{
public static void main(String[] args)
{
String result = "";
for (int i = 0; i < 1e6; i++)
{
StringBuilder tmp = new StringBuilder();
tmp.append(result);
tmp.append("some more data");
result = tmp.toString();
}
System.out.println(result);
}
}
Due to creation of 1 million objects and probable GC this has poor performance.Due to this reason it’s recommended to avoid Compiler optimization and use the String Builder class directly like this :
public class StringConcatenation
{
public static void main(String[] args)
{
StringBuilder result = new StringBuilder((int)1e6);
for (int i = 0; i < 1e6; i++)
{
result.append("some more data");
}
System.out.println(result.toString());
}
}
Java 9 bring another optimization
From Java 9 (Java Enhancement Proposal 280 or JEP 280) , the entire StringBuilder append sequence has been replaced with a simple invokedynamic (More on this in next section below) call to java.lang.invoke.StringConcatFactory, that will accept the values in the need of concatenation.
Before Java 7, the JVM only had four method invocation types: invokevirtual to call normal class methods, invokestatic to call static methods, invokeinterface to call interface methods, and invokespecial to call constructors or private methods.
Why need a Dynamic Implementation in-place of StringBuilder?
The primary motivation behind the change is to change the concatenation strategy without changing the bytecode thus avoiding recompilation altogether.
What is InvokeDynamic (also called as Indy)?
InvokeDynamic opcode was added as a part of JSR 292 (first realease of Java 7) to support efficient and flexible execution of method invocations in the absence of static type information.
JDK basically defines the bytecode specification at compile-time and implementation to that specification is chosen at runtime. Lets take an example, Suppose we want to concatenate “I am ” & “Groot”.
- A function signature is created i.e concat(String, String) -> String
- Arguments to above function are “I am ” & “Groot”
- A makeConcatWithConstants bootstrap method is called with above function signature, the arguments, and a few other parameters required for dynamicity (among them is the strategy more on this in next section) which returns a CallSite Object.
- This CallSite Object encapsulates a series of MethodHandles which points to the the actual target implementation for that function signature.
- Now this generated function is used to return the concatenated String “I am Groot”
Java 9+ String Concatenation Strategies
StringConcatFactory
offers different strategies to generate the CallSite
divided in byte-code generator using ASM and MethodHandle-based one.
BC_SB
: generate the byte-code equivalent to whatjavac
generates in Java 8.BC_SB_SIZED
: generate the byte-code equivalent to whatjavac
but try to estimate the initial size of theStringBuilder
.BC_SB_SIZED_EXACT
: generate the byte-code equivalent to whatjavac
but compute the exact size of theStringBuilder
.MH_SB_SIZED
: combines MethodHandles that ends up calling theStringBuilder
with an estimated initial size.MH_SB_SIZED_EXACT
: combines MethodHandles that ends up calling theStringBuilder
with an exact size.MH_INLINE_SIZED_EXACT
: combines MethodHandles that creates directly the String with an exact size byte[] with no copy.
The default and most performant one is MH_INLINE_SIZED_EXACT
that can lead to 3 to 4 times performance improvement. You can override the Strategy
on the command line by defining the property java.lang.invoke.stringConcat
.
It’s worth just having a look at the MH_INLINE_SIZED_EXACT
: combines MethodHandles to see how we can now use MethodHandle to efficiently replace code generation.
References :
- https://howtodoinjava.com/java9/compact-strings/
- http://cr.openjdk.java.net/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf
- https://arnaudroger.github.io/blog/2017/06/14/CompactStrings.html
- https://medium.com/better-programming/top-5-new-features-expected-in-java-14-82c0d85b295e
- https://www.baeldung.com/java-invoke-dynamic
- https://www.baeldung.com/java-string-concatenation-invoke-dynamic
Questions ? Suggestions ? Comments ?
What’s next? Follow me on Medium to be the first to read my stories.