Java Strings
Creation
A String can be constructed in two different ways in Java, implicitly as a string literal and explicitly using new
keyword. String literals are characters enclosed within double quotes.
String literal = "Michael Jordan";
String object = new String("Michael Jordan");
Though both declaration creates a string object, thereās a difference in the way both of these objects reside in the heap memory.
Internal Representation
Historically, strings were stored in the form of char[]
i.e., each character being a separate element in a character array. They were represented in the UTF-16 character encoding format. This means that every character takes 2 bytes of memory. This was disadvantageous as usage statistics suggested that most string objects consisted only of Latin-1 characters. Latin-1 characters can be represented using 1 byte of memory and this can significantly reduce the memory usage by 50%.
Hence, a new internal feature was implemented as a part of JDK 9 release based on the JEP 254 called Compact Strings. Firstly, the char[]
was modified to a byte[]
and an encoder flag field was added to represent the encoding used (Latin-1 or UTF-16). Then, encoding happens based on the content of the string. If the value contains only Latin-1 characters then Latin-1 encoding is employed (StringLatin1
class) or else UTF-16 encoding is applied (StringUTF16
class).
Memory Allocation
As stated earlier, thereās a difference in the way memory is allocated for these objects in the heap. Using the explicit new
keyword is pretty much straight forward as the JVM creates and allocates memory for the variable in the heap. Whereas using the string literal follows a process called Interning.
String Interning is a method of storing only one copy of each distinct string value, which must be immutable. The distinct values are stored in a String Intern pool (Wikipedia). String Intern Pool as it is constructed internally is a Hashtable
that stores the reference of every string object created through literals and itās hash. Though the string value resides in the heap, itās reference can be found in the intern pool.
This can be easily evidenced through below experiment. Letās have two variables with same value.
String firstName1 = "Michael";
String firstName2 = "Michael";System.out.println(firstName1 == firstName2); //true
During the code execution, when JVM encounters firstName1
it looks up in the string intern pool for a string value Michael
. If it couldnāt find one, a new entry is created in the intern pool for the object. When the execution reaches firstName2
, the process is again repeated and this time the value can be found in the pool based on firstName1
variable. So, instead of duplicating and creating a new entry, the same reference is returned. This is why the equality condition passes.
On the other hand, if a variable with value Michael
is created using the new
keyword, the interning doesnāt happen and equality condition fails.
String firstName3 = new String("Michael");System.out.println(firstName3 == firstName2); //false
Interning can be forced for firstName3
using intern()
method though it is not generally preferred.
firstName3 = firstName3.intern(); //InterningSystem.out.println(firstName3 == firstName2); //true
Interning may also appear to happen with concatenation of two string literals using +
operator.
String fullName = "Michael Jordan";System.out.println(fullName == "Michael " + "Jordan"); //true
But what really happens is that during compilation, compiler appends both of the literals and removes the +
operator from the expression to form a single string like below. During execution, both the value of fullName
and āappended literalā are interned and the equality condition passes.
//After Compilation
System.out.println(fullName == "Michael Jordan");
Equality
It is evident from above experiments that only string literals are interned by default. But a java based application is not expected to have only string literals as it can obtain strings from different sources. So, using the equality operator is not advisable and can produce undesirable results. Equality checks must only be done with equals
method. It performs equality based on the value of the string rather the memory address it is stored in.
System.out.println(firstName1.equals(firstName2)); //trueSystem.out.println(firstName3.equals(firstName2)); //true
Thereās also a slightly varied version of the equals
method called equalsIgnoreCase
which can be useful for case insensitive purposes.
String firstName4 = "miCHAEL";System.out.println(firstName4.equalsIgnoreCase(firstName1)); //true
Immutability
Strings are immutable i.e., their internal state is unmodifiable once created. The value of a variable can be modified but not the value itself. It follows all the principles to be immutable as provided here. Every method on the String
class that deals with manipulating the object (e.g., concat
, substring
) returns a new copy of the value rather than updating the existing value.
String firstName = "Michael";
String lastName = "Jordan";firstName.concat(lastName);
System.out.println(firstName); //Michael
System.out.println(lastName); //Jordan
As it can be seen, no change happens to either firstName
or lastName
variable. Methods of String class donāt modify the internal state, they create a new copy of the result and return as it can be seen below.
firstName = firstName.concat(lastName);
System.out.println(firstName); //MichaelJordan
More on Immutability in Java can be found here.
Happy Learning!!! šļø