Chapter 2. The Java Language

Understand the basics before building real-world apps!

Larry | Peng Yang
Mastering Java
9 min readOct 22, 2022

--

Photo by Karl Pawlowicz on Unsplash

Text Encoding

Java source code can be written using Unicode and stored in any number of character encodings, ranging from a full binary form to ASCII-encoded Unicode character values. The Java char type and String class natively supports Unicode values. One of the most common file encodings for Unicode, called UTF-8, preserves ASCII values in their single-byte form. This encoding is used by default in compiled Java class files, so storage remains compact for English text.

In java, a Unicode character can be represented with this escape sequence: \uxxxx where xxxx is a sequence of one to four hexadecimal digits.

jshell> System.out.println("\u3042")
jshell> System.out.println("あ")

charset is the set of characters you can use, e.g. ASCII and unicode. encoding is the way these characters are stored into memory, e.g. utf-8, utf-16.

ASCII defines 128 characters, which map to the numbers 0–127. Unicode defines (less than) 221 characters, which, similarly, map to numbers 0–221 (though not all numbers are currently assigned, and some are reserved).

Unicode is a superset of ASCII, and the numbers 0–127 have the same meaning in ASCII as they have in Unicode. Because Unicode characters don’t generally fit into one 8-bit byte, there are numerous ways of storing Unicode characters in byte sequences, such as UTF-32 and UTF-8. In UTF-8, every code-point from 0–127 is stored in a single byte. Code points (e.g. 3042) above 128 are stored using 2–4 bytes.

Data Types, Variables, and Constants

Data types specify the different sizes and values that can be stored in the variable. There are two types of data types in Java: primitive and non-primitive data types. Variables store information that you plan to change and reuse over time (or information that you don’t know ahead of time such as a user’s email address). Constants (use key word final.e.g. final double Pi = 3.1415926536;) store information that is, well, constant.

Primitive data types

The primitive data types include boolean, char, byte, short, int, long, float, and double.

primitive data types

Default values
variables that are declared as members of a class are set to default values if they aren’t initialized. In this case, numeric types default to the appropriate flavor of zero, and characters are set to the null character (\0), and boolean variables have the value false. (Reference types also get a default value, null). Local variables, on the other hand, must be explicitly initialized before they can be used. The compiler enforces this rule so there is no danger of forgetting.

Character literals
Literals of types char and String (String is non-primitive, will cover more in a future post) may contain any Unicode (UTF-16) characters. If your editor and file system allow it, you can use such characters directly in your code. If not, you can use a "Unicode escape" such as '\u0108' (capital C with circumflex), or "S\u00ED Se\u00F1or" (Sí Señor in Spanish). Always use 'single quotes' for char literals and "double quotes" for String literals.

char myChar = 'D'; // or '\u0044'String myString = "my string";
String myString = "DD"; // or "\u0044\u0044"

Interger literals
Integer literals (byte, short, int, long) can be specified in binary (base 2), octal (base 8), decimal (base 10), or hexadecimal (base 16). Integer literals are of type int unless they are suffixed with an L, denoting that they are to be produced as a long value:

int i = 1230; // decimal
int i = 0b01001011; // leading 0b or 0B, binary, i = 75 decimal
int i = 01230; // leading 0, octal, i = 664 decimal
int i = 0xFFFF; // leading 0x or 0X, hexadecimal, i = 664 decimal
long l = 13L;
long l = 13; // equivalent: 13 is converted from type int
long l = 40123456789L;
long l = 40123456789; // error: too big for an int without conversion
// other examples
byte myByte = 12;
short myShort = 23;
int myNum = 5;
long myLong = 14L;

Floating-Point literals
A floating-point (float, double) literal is of type float if it ends with the letter F or f; otherwise, its type is double and it can optionally end with the letter D or d. The floating point types can also be expressed using E or e (for scientific notation).

float f1  = 123.4f; // or 123.4F, cannot be 123.4
float f1 = 1.234e2f; // or 1.234e2F, cannot be 1.234e2
double d1 = 123.4; // or 123.4d or 123.4D
double d2 = 1.234e2; // same value as d1, or 1.234e2d or 1.234e2D

Underscore characters for numeric literals
you can utilize the _ underscore character between digits to break up a large string of digits.

int RICHARD_NIXONS_SSN = 567_68_0515; 
int for_no_reason = 1___2___3;
int JAVA_ID = 0xCAFE_BABE;
long grandTotal = 40_123_456_789L;
float pi = 3.14_159_265_358f; // 3.1415927
double pi = 3.14_159_265_358; // 3.14159265358

Non-primitive data types

The non-primitive data types include Class, Interface, Array, Enumeration, and Annotation. They are also known as reference types.

non-primitive data types

Class
A class is a user-defined data type that is used to create objects. A class contains a set of properties and methods that are common and exhibited by all the objects of the class. An example of a class definition is as below.

public class ClassExample {  // defining the variables of class  
int a = 20;
int b = 10;
int c;

// defining the methods of class
public void add () {
int c = a + b;
System.out.println("Addition of numbers is: " + c);
}
}

Interface
The interface is similar to a class and can include both functional methods and variables but the only difference is that the methods declared inside the interface are by default abstract. An example of an interface definition is as below.

interface CalcInterface {  
void multiply();
void divide();
}

Array
Arrays are used to store elements of the same data type (can be either primitive or non-primitive) in a contiguous manner. The basic usage of arrays is shown below.

int[] num; // declare an array
int[] num = {4, 10, 7}; // declare and initialize an array with literals
int[] num = new int[3]{1, 2, 3}; // declare and initialize an array with literalsint[] num = new int[3]; // define an array and assign values below
num[0] = 4;
num[1] = 10;
num[2] = 7;

Enumeration
Java Enums can be thought of as classes that have a fixed set of constants (a variable that does not change). The Java enum constants are static and final implicitly. We can define an enum either inside the class or outside the class. Java Enum internally inherits the Enum class, so it cannot inherit any other class, but it can implement many interfaces. We can have fields, constructors, methods, and main methods in Java enum. An example of an enum definition:

enum Seasons { WINTER, SPRING, SUMMER, AUTUMN };

Annotation
Java Annotation is a tag that represents the metadata i.e. attached with class, interface, methods, or fields to indicate some additional information that can be used by the java compiler and JVM. Annotations in Java are used to provide additional information, so it is an alternative option for XML and Java marker interfaces. There are built-in annotations (@Override, @Deprecated etc) and we can also create our own annotations.

class Dog extends Animal{  
@Override
void eatsomething(){
System.out.println("eating foods");
}
}

Special notes on reference types

Memory Allocation and Garbage Collection
In Java, the new keyword is used to create an instance of the class. In other words, it instantiates a class by allocating memory for a new object and returning a reference to that memory. Objects occupy memory in the Java heap space. We can also use the new keyword to create the array object. If there are no references to an object, the memory used by that object can be reclaimed during the garbage collection process.

Conversion Between Primitive Type and Reference Type
The conversion of primitive type to reference type is called autoboxing and the conversion of reference type to primitive type is called unboxing.

Comparing Reference Type
We can also compare the reference types in Java. Java provides two ways to compare reference types
1. Use euqal == operator to compare the memory locations of the objects.
2. Use equals() method. This method is defined in the Object class so that
every Java object inherits it. By default, its implementation compares object memory addresses, so it works the same as the == operator. However, we can override this method in order to define what equality means for our objects.

Copying Reference Type
There are 3 possibilities when we copy reference types that are known as reference copy, shallow copy and deep copy, a good example can be found here and here.

Primitive vs non-primitive data types

Primitive vs non-primitive data types

String and String pool

String is a special class in java and we can create String objects using a new operator as well as providing values in double-quotes. Java String Pool is the special memory region in heap where Strings are stored by the JVM.

Thanks to the immutability of Strings in Java, the JVM can optimize the amount of memory allocated for them by storing only one copy of each literal String in the pool, this process is called interning. Here is a diagram that clearly explains how String Pool is maintained in java heap space and what happens when we use different ways to create Strings.

String pool: by digitalocean.com
  • When we use double quotes to create a String, it first looks for String with the same value in the String pool, if found it just returns the reference else it creates a new String in the pool and then returns the reference.
  • However using new operator, we force String class to create a new String object in heap space. We can use intern() method to put it into the pool or refer to another String object from the string pool having the same value.

How many strings are created in String str = new String("Cat"); ?
The "Cat" is created and interned when the JVM loads the class that this line of code is contained in. If an "Cat" is already in the intern pool from some other code, then the literal might produce no new String object. Then new String("Cat") will create a new string object in heap, ignoring the pool completely.

String str = new String("Cat"); // create 1 or 2 strings
String anotherStr = "Cat"; // refer the already created Pool String object.

Manual Interning
We can manually intern a String in the Java String Pool by calling the intern() method on the object we want to intern. Manually interning the String will store its reference in the pool, and the JVM will return this reference when needed.

String constantString = "interned String";
String newString = new String("interned String");
assertThat(constantString).isNotSameAs(newString); // true
String internedString = newString.intern();
assertThat(constantString).isSameAs(internedString); // true

String Representation
Until Java 8, Strings were internally represented as an array of characters — char[], encoded in UTF-16, so that every character uses two bytes of memory.

With Java 9 a new representation is provided, called Compact Strings. This new format will choose the appropriate encoding between char[] and byte[] depending on the stored content. Since the new String representation will use the UTF-16 encoding only when necessary, the amount of heap memory will be significantly lower, which in turn causes less Garbage Collector overhead on the JVM.

References

--

--