Internals of Compiler and JVM

Aman Agrawal
The Startup
Published in
6 min readOct 16, 2020

Hello guys, I am back with a new blog and in this blog, we are going to talk about some important aspects of Java such as

  • Difference between JDK, JRE, and JVM
  • The Java Compiler
  • Internals of JVM

Let us first start with the differences between JDK, JRE, and JVM.

  • JDK — JDK stands for Java Development Kit. It provides a software development environment for developing Java applications. It includes the Java Runtime Environment (JRE), an interpreter (java), a compiler (javac), an archiver (jar), a documentation generator (Javadoc), and other tools required for the development of applications.
  • JRE — JRE stands for Java Runtime Environment. JRE is core libraries plus Java virtual machine. It provides an environment to execute a java application.
  • JVM — JVM stands for Java Virtual Machine. Whenever you execute a program via the java command, it creates a virtual environment in which the program is loaded along with core libraries.

Compilation

The Java Compiler compiles the source files (*.java) into class files. Each class file contains machine-independent byte code, and once compiled, it can be executed on any machine. Therefore class files are platform-independent whereas JVM is platform dependent. The reason behind this is JVM makes use of the internals of the Operating System. That is why we have different setups for different operating systems. The JVM transforms the byte code into machine code or native code.

The compilation of source files involves the following steps

  • Parse — Reads source files and then maps the resulting token sequence into the Abstract Syntax Tree. The Abstract Syntax Tree is a tree representation of the abstract syntactic structure of source code. Each node in a tree denotes a construct occurring in the source code. The syntax is “abstract” in the sense that it does not represent every detail appearing in the real syntax, but rather just the structural or content related details.
  • Enter — Enter symbols for the definitions into the symbol table. The Symbol table stores information about various entities such as variable names, function names, objects, classes, interfaces, etc. A symbol table may serve the following purposes
  1. To store the names of all the entities in the structured form at one place.
  2. To verify if a variable has been declared
  3. To implement type checking, verifying assignments and expressions in the source code are semantically correct.
  4. To determine the scope of the name
  • Process Annotations — If requested, processes the annotations found in the specified compilation units.
  • Flow — Performs dataflow analysis on the trees. This includes checks for assignments and reachability.
  • Generate — Generate .class files.

JVM Internals

The JVM loads, links, and initializes the .class file when it refers to a class for the first time at runtime.

The JVM is divided into 3 parts

  1. ClassLoader Subsystem — It is mainly responsible for 3 activities
  • Loading — The loading of classes is done by 3 class loaders. The first one is the BootStrap ClassLoader that loads classes from the rt.jar. The highest priority is given to this class loader. The second one is the Extension ClassLoader which loads classes that are inside the ext folder (jre\lib). The third one is the Application ClassLoader which loads classes that are there on the classpath. The JVM follows Delegation-Hierarchy principle to load classes. First, the class is loaded by the BootStrap ClassLoader. If not found, it is delegated to Extension ClassLoader. If not found, it is delegated to Application ClassLoader. If not found, the JVM will throw ClassNotFoundException.
  • All the above class loaders read .class files, generate the corresponding binary data, and then save it in the method area. For each .class file, JVM stores the fully qualified name of the loaded class and its immediate parent class, whether the .class file is related to class or interface or enum and modifier, variables and method information, etc.
  • Linking — It involves 3 steps in which the first one is the verification which checks the correctness of .class file, that is, it checks whether the file is properly formatted and generated by the valid compiler or not. If the verification fails, the JVM throws run time java.lang.VerifyError exception.
  • Once verification is done, the JVM creates an object of type Class to represent this file in the heap memory. This class object can be used by the programmer for getting class level information like the name of the class, parent name, methods, and variable information, etc. To get the class object we can make use of the getClass() method of the Object class.
  • The next step is the Prepare phase where all the static variables memory will be allocated and assigned with default values. The last step is the Resolution where all symbolic references from the type are replaced with the direct references. It is done by searching into the method area to locate the referenced entity.
  • Initialization — In this phase, all static variables are assigned with their values defined in the code and static block is executed if any. This is executed from top to bottom in a class and from parent to child in class hierarchy.

2. The Runtime Data Area

  • Method Area — All the class-level data will be stored here, including static variables. There is only one method area per JVM, and it is a shared resource.
  • Heap Area — All objects are stored in this area. One heap per JVM and it is a shared resource. My next blog will be on heap memory.
  • Stack Area — For every thread, a separate stack is created. For every method call, a stack frame is pushed on to the stack. The stack area is thread safe as each thread can access its own corresponding stack. For detailed information on stack frame, you can refer to the post given below
  • PC Registers — Each thread will have separate PC registers to hold the address of the current executing instruction. Once this instruction gets over it is updated with the next instruction.
  • Native Method Stack — Native method stack holds native method information. For every thread, a separate native method stack is created.

3. The Execution Engine — The generated byte code is then executed by the execution engine.

  • Interpreter — It interprets the bytecode line by line and then it executes. The disadvantage here is that when a method is called multiple times, every time a new interpretation is required.
  • JIT Compiler — It is used to increase the efficiency of interpreter. It compiles the entire byte code and changes it to native code so whenever interpreter see repeated method calls, JIT provide direct native code for that part so re-interpretation is not required, thus efficiency is improved.
  • Garbase Collector — Collects and removes unreferenced objects. Garbage collector can be triggered by calling System.gc(), but the execution is not guaranteed.

4. Java Native Interface — It is the interface which interacts with the Native Method libraries and provides the native libraries (C, C++) required for the execution. It enables JVM to call C/C++ libraries and to be called by C/C++ libraries which may be specific to hardware.

5. Native Method Libraries — It is a collection of the Native Libraries (C, C++) which are required by the execution engine.

Thats it from this blog, I hope you guys liked it.

Do send me your feedback as it will help to improve and write more.

Thank You.

--

--