Deep Dive Into Hello World In Java
For most, “Hello World” has been the prominent choice of test phrase when beginning journey of learning a new programming language.
It’s widely used in examples and tutorials. Making a quick search in the Github reveals over 1.6M results so there’s no doubt of its widespread use. Java was the second most used language — after HTML — in the search.
Writing The Program
Software written in Java is usually compiled into Java Bytecode which are then executed in the Java Virtual Machine. Let’s write the Hello World example. First, create a file named “HelloWorld.java” and write the content below.
Javac is the software used for converting source code conforming to Java Language Specification into JVM compatible bytecode. Executing the below statement generates a file named “HelloWorld.class” .
$ javac HelloWorld.java
The .class File Format
A Java class file contains the actual bytecodes for a class, constant pool,
access flags, version metadata, superclass & interface id (actual superclass and interface names are stored in the constant pool) and various attributes. To see more about what a class file contains, you can see more about the class file format at https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html
A class file is identified by its first 4 bytes, printing it shows following out:
0xCAFEBABE is the magic number that JVM uses to identify class files. Lets disassemble the class file we generated to see more.
Disassembling
When a Java Virtual Machine starts up, it first looks for a main function in the specified class.
The main class is usually defined as public static void main(String[] args)
in Java Language. In JVM, that method is searched within the class file with a method name of “main” and a method descriptor of ([Ljava/lang/String;)V
, which basically means a method that takes an array of String class instances as parameter and returns void.
Details about descriptors can be found at: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.3.2
We can disassemble a class file with the javap
tool:
Decompiled output shows us that there are two methods in our class file. First one is the constructor, aka <init>
function in JVM’s notations; the other
one is the entry point of our program, our main method. Even though we haven’t added a constructor, the compiler added a basic constructor invoking the superclass’s constructor. In our case, HelloWorld class doesn’t have an explicit superclass but every class in java is ultimately derived from the Object
class so <init>
function of Object is invoked.
Since main method is a static method, the constructor of our class is not invoked during execution of our HelloWorld program.
Interpreting The Bytecodes
The bytecodes are obviously doing something that prints the “Hello World”. Lets follow them in the light of JVM Instruction Set Specs.
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
getstatic
instruction pushes the value of a static field into the operand stack after initializing its class through <clinit>
method if not initialized already.
Next two bytes after this instruction is built into an index used for fetching the field name from the Constant Pool. In our case, the static field is the “System.out”.
3: ldc #3 // String Hello, World!
ldc
is the instruction for loading an item from the constant pool and pushing it into the operand stack. In our case, the constant pool entry is a String literal, so a reference to a String “Hello, World!” is pushed into the stack.
Detailed information about the constant pool can be found at here:
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
invokevirtual
invokes the instance method fetched from constant pool with arguments from the operand stack. The println method we called, takes 1 argument. Since this is an instance method, we need an instance object reference which is also at the operand stack as we previously pushed with getstatic.
After popping 2 values from the operand stack, we can run the println
method.
After tracing the java.io library, the println method eventually calls a function called FileOutputStream.writeBytes
that is defined as a private native void. So there’s no implementation for it in java.
Java Native Interface & System Calls
JVM is isolated from the underlying system by design. So any action requiring access to the underlying system is done by native methods via JNI. In our case, we need to print stuff to the console which is done by writing into the
file descriptor 1 as stated in the POSIX and the java.io.FileDescriptor#out
.
Even a simple program in java requires many native methods to be registered and linked. For example, running our Hello World example with below command shows some relevant native methods for printing the text.
We traced to the writeBytes method and found that its defined as a native void. Tracing further shows us implementation of the FileOutputStream.writeBytes
can be found at JDK sources.
User level programs usually communicate with the kernel through system calls. In our case, the system call for writing data into a file descriptor is “Write”. The writeBytes method is a wrapper for write function which is also a wrapper for the write system call.
To prove our point and path we reached, we can use strace
to dump all system calls JVM and its forks does during execution.
A write system call writing into the STDOUT (fd 1 as per POSIX) can be seen. Detailed information about the write system call and it’s wrapper can be found at the man file.
Conclusion
Even though JVM instruction set has a wide range of instructions, any action requires communicating with the kernel or underlying host requires native code to be executed through the Java Native Interface.
Digging into a simple Hello World program can give us hints about how JVM works and communicates out of it’s isolated space.
Discussion is encouraged.
References
JVM Specifications
Java Language Specifications
JDK Source
Wikipedia Write SysCall
Write Syscall Man Page