Internal Structure of Java Virtual Machine

Asinshani Taniya
9 min readOct 5, 2021

--

JVM is the process which make the environment to run a byte code. Except that JVM inside do some special functions. In here, we go through those.

Emma Thompson as Elinor Dashwood in Sense and Sensibility

Virtual Machine

Machine is a device that make your work done, but that machine is not in the reality, or do not exist as an entity, we call it Virtual Machine.

There are two types of virtual machines,

System-based Virtual Machine (SVM)

There will be one or multiple hardware, and it creates multiple complete independent environments.

Examples: Xen and Hypervisor

Application-based Virtual Machine (AVM)

There will be a process, and it creates an environment to run another program or language, and it converts it to some understandable output.

Examples: JVM (VM for Java), CLR (Common Language Runtime - VM for .net) and PVM (Parrot Virtual Machine - VM for Dynamic Programming)

Java Virtual Machine (JVM)

JVM is an Application-based Virtual Machine (or Process-based Virtual Machine) uses in Java programming.

It is a complete specification, which means it says how it should be done.

When you download and install JRE (Java Runtime Environment), it deploys the codes to create the JVM.

If you install JRE in an operating system, it will deploy the codes for creating the JVM for that particular environment. So that JRE is tightly platform depended, as well as JVM is also platform depended.

JVM is not a static in your computer. If you're not execute any Java program in your computer at a time, at that time you have not any JVM in your computer.

When JVM exists?

Once you run a program, a JVM, or we say it a JVM instance, will be created.

If you execute n number of programs in the same time, your computer has n number of JVM instances at that time.

With the JVM instance, there will be created at least one non-demon thread. It provides general service from background as long as the program runs, it is not a part from the program.

From the help of non-demon thread, there will be start a demon thread to execute the main method of the program.

When JVM exits?

There will be only two reasons for a JVM instance to exit or die.

  1. Death of non-demon thread.
  2. Application calls System exit.

Data Types used in JVM

Let’s know about data types used in JVM. It is quite similar to the data types we used in Java language. There are mainly two types of data types.

Primitive Data Type

  • It is the same thing when it is compared to Java language, primitive data type is holding the value itself.

There is a little difference when it comes to boolean. When boolean in JVM, it is represented by INT or BYTE. Boolean false is 0, and true is non-zero. If it is an array, it represented by an array of bytes.

  • Size of primitive data types also same as in the Java language.

Sizes of primitive data types are byte (8 bits), short (16 bits), int (32 bits), long (64 bits), float (32 bits), double (64 bits), and char (16 bits).

  • There is a new primitive data type called Return Address data type that is used to implement the final block/keyword. It is specific to the JVM.

Reference Data Type

  • It is the same thing when it is compared to Java language, primitive data type is not holding the value itself but a reference.

There are three types of references can hold,

  1. Class Reference: Hold the reference of an instance of the class.
  2. Interface Reference: Hold the reference of an instance of the class which is implemented the particular interface. As an example, let’s say interface A is implemented by class B, interface reference holds the reference of an instance of class B.
  3. Array Reference: Hold the reference to an array.

There is something called Null Reference, which is reference to no-where.

3 Phases of JVM

JVM does 3 functions called loading, storing and executing. In terms of these functionalities, we can define 3 phases.

  1. Class Loader: Loading
  2. Memory Area: Storing
  3. Execution Engine: Executing

Memory Area

The size of the Memory Area depends on each JVM implementation. The Memory Area splits into 5 phases according to what is storing.

  1. Method Area: When Class Loader load the class, it will store all the class information here including Type information
  2. Heap Area: Store all object information
  3. Stack: Store Method information (plus Local variables)
  4. PC Registers: Hold information about next execution if it is not a native method
  5. Native Method Area: Store native method information

Method area and Heap Area update once per JVM instance created. Stack, PC registers and Native Method Area updating per thread.

How Stack, PC Registers and Native Method Area works per thread?

Let’s say we have 4 threads with following flow of methods.

As you can see, the fourth thread call a native method. Let’s assume one moment that the next execution of the fourth thread is that native method. And for thread one it is m3, thread 2 it is m2 and thread three it is m1.

In PC Register will not store the method data of fourth thread, because at the moment it is accessing to a native method. It can be null, or not defined.

But at the same moment, other all threads access Java methods, so that you can see information of those methods in PC registry.

Class Loader

There are two types of class loaders in JVM, one is Bootstrap class loader, another one is Customer-defined class loader.

I will not go to differentiate these two deeply, instead of that let's get to know generally what a class loader is.

The main thing the class loader does is, take the class file and load into the memory area. Other than that,

There are 3 functions that the class loader specified to do; Loading, Linking and Initialization. When linking there will be 3 functions to be done, those are Verification, Preparation and Resolution.

Let’s assume, you write a Java code, and you compile it. Now you have byte code. Okay, you execute it now, which means you ask JVM to execute.

Loading

Here the class loader takes your class file and load into the memory area. When doing that, class loader read following details of each class,

  1. Fully qualified class name
  2. Variable details
  3. Immediate parent class details
  4. Whether it is class, interface or Enum

Then it will create a new class type object, and assign your class into this abject and put it into the heap area. This object is created once for one class.

  • Class type means the specified type called Class, it is not your class type.

Assume you have Employee class, when you execute, class loader load the class and read the above four details. Then create a new class type object and assign Employee to the object. Finally, put this object into heap.

Let’s say we have another class called Manager, here it calls the Employee class instance. Now, class loader will not create again a new class type object for that Employee class again, because we have one already.

Linking-Verification

When class loader loads the class, it will do verification through a sub program called Byte-Code Verifier. Here JVM make sure the loaded file is safe to execute or not. It will confirm,

  1. Whether it is from valid compiler
  2. Whether structure is correct
  3. Whether format is correct

If anything false here, JVM throws a run-time exception called Verify Exception. If you got this exception means the class file is altered somewhere.

Linking-Preparation

In the preparation, JVM specify to assign default value (it is not an initial value) for instance level variable or static variable of your class.

If the variable is,

Integer → default is 0

Boolean → default is False

Object → default is NULL

This is a kind of programmer friendly work the JVM does for us.

Linking-Resolution

In this phase, JVM command to assign reserved memory locations to the domain specific words/names/objects.

Here domain is developer, developer used high-level names to represent objects. So, in resolution, JVM will be directed the high-level words by the correct memory locations.

For example, when we define an object called employee, it is a domain specific name, the JVM will convert into the correct memory location.

Initialization

In the initialization, JVM specify to do two functions.

  1. Assign real/exact values for variables
  2. Execute the static method or static block

Do all classes go through this initialization process?

There are four ways to create a new instance or object, from that, two ways only initialize through this initialization process. Those are, when create a new object using new keyword, and when create a new object using getInstance( ) method from Reflection API.

The four ways to create a new instance or object:

  1. Using new keyword
  2. Using getInstance( ) method from Reflection API
  3. Using clone( ) Method
  4. Using java.io.ObjectInputStream

When create a new object using clone( ) method, it will initialize from the source/parent object. When create a new object using io.ObjectInputStream, it will initialize from all non-transient variables.

When the initialization should be finished?

Inside class loader JVM quite flexible for the implementation, it can do these phases sequentially or parallelly. But there is a limitation on initialization, that is, initialization must do before an Active Use of a Class.

Execution Engine

The byte code we are moving to execute is a human-readable code, so it should be converted into machine-readable to execute. Execution engine does that. There are three components we can divide the execution engine into.

Interpreter

The interpreter reads the byte code and convert it into machine code. And then it executes the machine code sequentially which means line by line.

It converts the every part of byte code to machine code in every time it finds, even it is already converted in somewhere before. As an example, the same method converts/interpreted in multiple times when it invoke in multiple places. So it takes time, hence the execution is slow.

JIT (Just In Time) Compiler

This created to overcome the disadvantage of Interpreter.

There is one component called Profiler that identify the hotspot methods. It holds a different count for number of invokes for every method. If that count reaches the maximum value expected (the threshhold value) for any method that repeated method is called as Hotspot.

Note: The threashhold value is different from JVM to JVM.

When the profiler identify the hotspot, JIT compiler converts into machine code. And when byte code again call that method, JIT compiler gives its machine code to interpreter. No need to interpret line by line again.

Note: The whole code interpretes at least once. This is applicable where there is repeated methods.

Garbage Collection

This is a program that manages the memory automatically. It collects and removes the unreferenced objects from the heap. It frees the heap.

……………..

What JVM will execute if there are two exactly the same classes?

It will consider in the following way, if we give,

  1. Class Paths: JVM will execute the class file that comes in first path.
  2. JVM Arguments: JVM will execute the last one.

Okay, That’s it from my side about JVM. Happy Learning! Bye! : )

--

--