Java Virtual Machine (JVM) Architecture Explained for Beginners

Ramsunthar Sivasankar
Nerd For Tech
Published in
11 min readMay 8, 2021

--

image:signifytechnology.com

As a Java developer, it’s really important to understand Java Virtual Machine (JVM) architecture and how Java works to get the most out of it in an efficient manner. You will get the basic idea of JVM after reading this article.

What is Java?

Java is a cross-platform object-oriented programming language that was released by SUN Microsystems in 1995. Now Java is used in various sectors such as Android apps, Java Web Applications, Trading Applications, Big Data technologies, and so on.

It will work by first compiling the source code into byte code. Then, in the Java Virtual Machine (JVM), the byte code will be compiled into machine code.

Java achitecture diagram
Java architecture (image:javacodemonk.com)

The JAVA architecture includes 3 main components as mentioned in the above diagram,

  • Java Development Kit (JDK)
  • Java Runtime Environment (JRE)
  • Java Virtual Machine (JVM)

Java Development Kit (JDK)

It is a software development environment that is used to develop java applications. It contains JRE and development tools such as javac, jheap, jconsole, etc.

Java Runtime Environment (JRE)

JRE is a part of JDK and it builds a runtime environment where the Java program can be executed. It contains the libraries and software needed by the Java programs to run. It takes the Java code and integrates with the required libraries, and then starts the JVM to execute it. Based on the Operating System, JRE will deploy the relevant code of the JVM.

Java Virtual Machine (JVM)

Types of VM

Before getting into the JVM, Let’s have a look at the VM (Virtual Machine). Basically, on a physical hardware system, a virtual machine (VM) is a virtual environment that acts as a virtual operating system with its own CPU, memory, network interface, and storage. There are mainly two categories in VM as shown in the above diagram,

  1. System based VM (SVM)
  • It is a system platform that allows to share the host computer’s physical resources while each running their own copy of the OS.

2. Application based VM (AVM) or Process based VM

  • In here It will allow to run a single process as an application on a host machine.

What is JVM ?

It is an engine that provides a run-time environment to run the Java applications and it is part of JRE. We all know that programming languages like C/C++ are called compiled languages because here, the code is first compiled into machine code. When we talk about languages like JavaScript and Python, the system executes the instructions without compiling so these are called interpreted languages.

But in Java, uses the combination of both (compiler and interpreter). source code (.java file) is first compiled into byte code and generates a class file (.class file). Then JVM converts the compiled binary byte code into a specific machine language. In the end, JVM is a specification for a software program that executes code and provides the runtime environment for that code.

For example, consider there is a Java file called “Test.java”. To compile this source code file, we need to use the following command.

javac Test.java

Here when “javac” is called in the command prompt, it will read the java code and compile then into the bytecode class file. So to run this code, we need to use the class name as follows with the keyword “java”.

java Test

when the “java” keyword is called, it will request the operating system to create a JVM instance (class should be public and void) and in the JVM, it will go into various steps, and finally, in the execution engine, byte code will be compiled into the machine code.

For each program there will be a JVM instance will be created. So when the program ends, the JVM instance will be destroyed. along with that JVM will create a non-daemon(user threads) thread to execute the java application.

JVM will be destroyed under 2 circumstances such as,

  1. If there are no non-daemon threads running. At that moment, the JVM will forcefully terminate all the active daemon threads.
  2. If the Java app kills itself (by calling System.exit() method).

and obviously, JVM will be destroyed if it crashes.

JVM architecture

JVM architecture diagram (image:dzone.com)

There are mainly three sub systems in the JVM as shown in the above diagram,

  1. ClassLoader
  2. Runtime Memory/Data Areas
  3. Execution Engine

ClassLoader

This component is responsible for bringing the class files to the RAM since JVM resides on the RAM and it performs three functions such as loading, linking, and initialization.

Loading

This process usually starts with loading the main class (class with the main()method). ClassLoader reads the .class file and then the JVM stores the following information in the method area.

  • The fully qualified name of the loaded class
  • variable information
  • immediate parent information
  • whether it is a class or interface or enum

Note —Only for the first time, JVM creates an object from a class type object for each loaded java class and store that object in the heap.

The three main ClassLoaders in JVM,

  1. Bootstrap ClassLoader — This is the root class loader and it is the superclass of Extension ClassLoader. This loads the standard java packages which are inside the rt.jar file and some other core libraries.
  2. Extension ClassLoader — This is the subclass of the Bootstrap ClassLoader and a superclass of Applications ClassLoader. This is responsible for loading classes that are present inside the directory (jre/lib/ext)
  3. Application ClassLoader — This is the subclass of Extension ClassLoader and this is responsible for loading the class files from the classpath (classpath can be modified by adding the -classpath command-line option)

The four main principles in JVM,

  1. Visibility Principle — This principle states that the ClassLoader of a child can see the class loaded by Parent, but a ClassLoader of parent can’t find the class loaded by Child.
  2. Uniqueness Principle — This principle states that a class loaded by the parent ClassLoader shouldn’t be loaded by the child again. This ensures that there is no class duplicated.
  3. Delegation Hierarchy Principle — This rule states that JVM follows a hierarchy of delegation to choose the class loader for each class loading request. Here, starting from the lowest child level, Application ClassLoader delegates the received class loading request to Extension ClassLoader, and then Extension ClassLoader delegates the request to Bootstrap ClassLoader. If the requested class is found in the Bootstrap path, the class is loaded. Otherwise, the request again transfers back to the Extension ClassLoader level to find the class from the Extension path or custom-specified path. If it also fails, the request comes back to Application ClassLoader to find the class from the System classpath and if Application ClassLoader also fails to load the requested class, then we get the run time exception — ClassNotFoundException.
  4. No Unloading Principle — This states that a class cannot be unloaded by the Classloader even though it can load a class. Instead of unloading, a new ClassLoader can be created by deleting the existing ClassLoader.

Linking

This process can be divided into three main parts and they are,

1. Verification

This phase check the correctness of the .class file. Byte code verifier will check the followings,

  • whether it is coming from a valid compiler or not (Because anyone can create their own compiler).
  • whether the code has a correct structure and format.

if any of these are missing, JVM will throw a runtime exception called “java.lang.VerifyError” Exception. if not, then the preparation process will take place.

2. Preparation

In this phase, For all static variables memory will be allocated and assigned with default values based on the data types.

object — null
int — 0
boolean— false

For example, Let consider the following line of code,

boolean status=true;

So in this phase, it will check the code and the variable status in boolean type so JVM assigns false to that variable. (default value of boolean is false as I mentioned above)

3. Resolution

This is the process of replacing the symbolic references with direct references and it is done by searching into the method area to locate the referenced entity. The machine does not understand the name that we give to create objects. So the JVM will assign memory location for those objects by replacing their symbolic links with direct links.

Initialization

In this phase, the original values will be assigned back to the static variables as mentioned in the code and a static block will be executed(in any). The execution takes place from top to bottom in a class and from parent to child in the class hierarchy. Most importantly, JVM has a rule saying that the initialization process must be done before a class becomes an active use.

Active use of a class are,
1. using new keyword. (Example: Vehicle van=new Vehicle();).

2. invoking a static method.

3. assigning value to a static field.

4. if a class is an initial class (class with main()method).

5. using a reflection API (getInstance()method).

6. initializing a subclass from the current class.

There are four ways of initializing a class and they are,

  1. using new keyword — this will goes through the initialization process.
  2. using clone(); method — this will get the information from the parent object (source object).
  3. using reflection API (getInstance();) — this will goes through the initialization process.
  4. using IO.ObjectInputStream(); — this will assign initial value from InputStream to all non-transient variable

Runtime Data Area

JVM memory is basically divided into five following parts,

Memory Area

Method Area

This is where the class data is stored during the execution of the code and this holds the information of static variables, static methods, static blocks, instance methods, class name, and immediate parent class name(if any). This is a shared resource.

Heap Area

This is where the information of all objects is stored and it’s a shared resource just like the method area.

let’s take the following code sample as an example,

Book book = new Book();

So in here, there is an instance of Book is created and it will be loaded into the Heap Area.

Note — there is only one method area and one heap area per JVM.

Stack Area

All the local variables, method calls, and partial results of a program (not a native method) are stored in the stack area. For every thread, a runtime stack will be created. A block of the stack area is known as “Stack Frame” and it holds the local variables of method calls. So whenever the method invocation is completed, the frame will be removed (POP). Since this is a stack, it uses a Last-In-First-Out structure.

PC Register (Program Counter Register)

This will hold the thread’s executing information. Each thread has its own PC registers to hold the address of the current executing information and it will be updated with the next execution once the current execution finishes.

Native Method Area

This will hold the information about the native methods and these methods are written in a language other than Java, such as C/C++. Just like stack and PC register, a separate native method stack will be created for every new thread.

Take a look at the following diagram,

Lets take a look at the following sample code as a scenario for Thread 1 (T1),

M1(){
M2();
}
-----------------
M2(){
M3();
}

When the M1 method is called, the first frame will be created in the T1 thread and from there it will go to method M2 at that time the second frame will be created and from there it will go to method M3 as in the above demo code, so a new frame will be created under M2.

Whenever the method exits, the stack frames will be destroyed respectively.

But in the T4 thread in the stack, the method M2 is accessing the native method. So at the time, the T4 in the PC register will be null or undefined but it will hold the information about all other 3 threads as shown in the above diagram.

Execution Engine

This is where the execution of bytecode (.class) occurs and it executes the bytecode line-by-line. Before running the program, the bytecode should be converted into machine code. let see which parts are responsible for this task.

Mainly, Execution Engine has three main components for executing the Java classes,

Components of Execution Engine

Interpreter

This is responsible for converting bytecode into machine code. This is slow because of the line-by-line execution even though this interprets the bytecode quickly. The main disadvantage of Interpreter is that when the same method is called multiple times, every time a new interpretation is required and this will reduce the performance of the system. So this is the reason where the JIT compiler will run parallel to the Interpreter.

JIT Compiler (Just In Time Compiler)

This overcomes the disadvantage of the interpreter. The execution engine first uses the interpreter to execute the bytecode line-by-line and it will use the JIT compiler when it finds some repeated code. (Eg: calling the same method multiple times). At that time JIT compiler compiles the entire bytecode into native code (machine code). These native codes will be stored in the cache. So whenever the repeated method is called, this will provide the native code. Since the execution with the native code is quicker than interpreting the instruction, the performance will be improved.

Garbage Collector

This will check the heap area whether there are any unreferenced objects and it destroys those objects to reclaim the memory. So it makes space for new objects. This runs in the background and it makes the Java memory efficient. There are two phases involved in this process,

  1. Mark — In this area, Garbage Collector identifies the unsued objects in the heap area.
  2. Sweep — In here, Garbage Collector removes the objects from the Mark.

This process is done by JVM at regular intervals and it can also be triggered by calling System.gc() method.

Java Native Interface (JNI)

This is used to interact with the Native(non-java) Method libraries (C/C++) required for the execution. This will allows JVM to call those libraries to overcome the performance constraints and memory management in Java.

Native Method Libraries

These are the libraries that are written in other programming(non-java) languages such as C and C++ which are required by the Execution Engine. This can be accessed through the JNI and these library collections mostly in the form of .dll or .so file extension.

--

--

Ramsunthar Sivasankar
Nerd For Tech

MSc student of Greenwich University || Software Engineer