Java Strings

An internal viewšŸ”¬

Vivek T S
4 min readJul 24, 2022

Creation

A String can be constructed in two different ways in Java, implicitly as a string literal and explicitly using new keyword. String literals are characters enclosed within double quotes.

String literal   = "Michael Jordan";
String object = new String("Michael Jordan");

Though both declaration creates a string object, thereā€™s a difference in the way both of these objects reside in the heap memory.

A hand holding a guitar
Photo by Oksana Zub

Internal Representation

Historically, strings were stored in the form of char[] i.e., each character being a separate element in a character array. They were represented in the UTF-16 character encoding format. This means that every character takes 2 bytes of memory. This was disadvantageous as usage statistics suggested that most string objects consisted only of Latin-1 characters. Latin-1 characters can be represented using 1 byte of memory and this can significantly reduce the memory usage by 50%.

Hence, a new internal feature was implemented as a part of JDK 9 release based on the JEP 254 called Compact Strings. Firstly, the char[] was modified to a byte[] and an encoder flag field was added to represent the encoding used (Latin-1 or UTF-16). Then, encoding happens based on the content of the string. If the value contains only Latin-1 characters then Latin-1 encoding is employed (StringLatin1 class) or else UTF-16 encoding is applied (StringUTF16 class).

Memory Allocation

As stated earlier, thereā€™s a difference in the way memory is allocated for these objects in the heap. Using the explicit new keyword is pretty much straight forward as the JVM creates and allocates memory for the variable in the heap. Whereas using the string literal follows a process called Interning.

String Interning is a method of storing only one copy of each distinct string value, which must be immutable. The distinct values are stored in a String Intern pool (Wikipedia). String Intern Pool as it is constructed internally is a Hashtable that stores the reference of every string object created through literals and itā€™s hash. Though the string value resides in the heap, itā€™s reference can be found in the intern pool.

This can be easily evidenced through below experiment. Letā€™s have two variables with same value.

String firstName1   = "Michael";
String firstName2 = "Michael";
System.out.println(firstName1 == firstName2); //true

During the code execution, when JVM encounters firstName1 it looks up in the string intern pool for a string value Michael. If it couldnā€™t find one, a new entry is created in the intern pool for the object. When the execution reaches firstName2, the process is again repeated and this time the value can be found in the pool based on firstName1 variable. So, instead of duplicating and creating a new entry, the same reference is returned. This is why the equality condition passes.

On the other hand, if a variable with value Michael is created using the new keyword, the interning doesnā€™t happen and equality condition fails.

String firstName3 = new String("Michael");System.out.println(firstName3 == firstName2);           //false

Interning can be forced for firstName3 using intern() method though it is not generally preferred.

firstName3 = firstName3.intern();                      //InterningSystem.out.println(firstName3 == firstName2);          //true

Interning may also appear to happen with concatenation of two string literals using + operator.

String fullName = "Michael Jordan";System.out.println(fullName == "Michael " + "Jordan");     //true

But what really happens is that during compilation, compiler appends both of the literals and removes the + operator from the expression to form a single string like below. During execution, both the value of fullName and ā€œappended literalā€ are interned and the equality condition passes.

//After Compilation
System.out.println(fullName == "Michael Jordan");

Equality

It is evident from above experiments that only string literals are interned by default. But a java based application is not expected to have only string literals as it can obtain strings from different sources. So, using the equality operator is not advisable and can produce undesirable results. Equality checks must only be done with equals method. It performs equality based on the value of the string rather the memory address it is stored in.

System.out.println(firstName1.equals(firstName2));       //trueSystem.out.println(firstName3.equals(firstName2));       //true

Thereā€™s also a slightly varied version of the equals method called equalsIgnoreCase which can be useful for case insensitive purposes.

String firstName4 = "miCHAEL";System.out.println(firstName4.equalsIgnoreCase(firstName1));  //true

Immutability

Strings are immutable i.e., their internal state is unmodifiable once created. The value of a variable can be modified but not the value itself. It follows all the principles to be immutable as provided here. Every method on the String class that deals with manipulating the object (e.g., concat, substring) returns a new copy of the value rather than updating the existing value.

String firstName  = "Michael";
String lastName = "Jordan";
firstName.concat(lastName);

System.out.println(firstName); //Michael
System.out.println(lastName); //Jordan

As it can be seen, no change happens to either firstName or lastName variable. Methods of String class donā€™t modify the internal state, they create a new copy of the result and return as it can be seen below.

firstName = firstName.concat(lastName);       

System.out.println(firstName); //MichaelJordan

More on Immutability in Java can be found here.

Happy Learning!!! šŸ—žļø

--

--