Cracking JVM code, part III

5 min readJan 2, 2022

JVM runtime data structures

Cracking JVM code, part I (Onboarding)
Cracking JVM code, part II (First experiment)

https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html — This is a helpful resource for the following section.

JVM has different structures for running bytecode. The two biggest are Heap with old and young generations, which is common to all threads, and JVM stack, which is created for each thread. Because our sum() function is very simple and not concurrent, let’s concentrate instead on Frame:

Frame:
https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.6

A frame is used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions.A new frame is created each time a method is invoked. A frame is destroyed when its method invocation completes, whether that completion is normal or abrupt (it throws an uncaught exception).

Once the method is called, Frame is created with Local Variable Array (LVA) and Operand Stack (OS). Knowing this, let’s simulate running our sum() method and observe the stack oriented architecture.

javap:

Code:
 stack=2, locals=2, args_size=2
 0: iload_0
 1: iload_1
 2: iadd
 3: ireturn

Let’s call sum(22,6)

stack=2, locals=2, args_size=2
From this line, we know that our LVA max size is 2 and OS max size is 2.

0: iload_0

Push value of LVA[0] onto the OS

1: iload_1

Push value of LVA[1] onto the OS

2: iadd

Pop values from OS, add and push the result onto OS

3: ireturn

Value is popped from the OS of the current frame and pushed onto the OS of the frame of invoker.

This is perhaps the most salient point in understanding JVM bytecode running. Some operations PUSH values to the stack (from LVA or constants), while other operations POP values from the stack, performing appropriate actions on them.
At this point, we understand enough to start proceeding faster. I’ll provide another example of internal JVM kitchens.

For the last example, I will use IntelliJ build in Bytecode Viewer (just for convenience):

Let’s look at some interesting points from main():

And the bytecode:

Starting from line 20: L3 (IntelliJ Bytecode viewer plugin shows method metadata at the end):

20:  L3
21:    LOCALVARIABLE args [Ljava/lang/String; L0 L3 0
22:    LOCALVARIABLE bc Lcom/company/ByteCodeExamples; L1 L3 1

args — one-dimensional array of String class objects [L(Table 4.3-A. Interpretation of field descriptors) type. Gets index 0 in LVA. Has scope from L0-L3 (all main function).
bc — local variable of ByteCodeExamples type. Gets index 1 in LVA and visibility after defining until the end of the main (L1-L3).

This is how the stack appears during execution, with the accompanying description following the image:

6:  NEW com/company/ByteCodeExamples

This command is responsible for allocating the ByteCodeExamples object. If the command succeeds, it references the newly created object to the OS. At this point, the object’s internal fields have default values, meaning the object isn’t initialized.

7:  DUP

Duplicate the top operand stack value (in our case, the result of the new, reference to ByteCodeExamples object). I’ll explain why this is necessary a little later.

8:  INVOKESPECIAL com/company/ByteCodeExamples.<init> ()V

and finally, we call constructor <init>. INVOKESPECIAL takes the reference value from the stack to initialize, allocated with new memory default values. Take note that Void(V) is returned, which is why we need DUP. Now we still have a reference on the stack.

9:  ASTORE 1

moves the reference value from stack into LVA[1]
At this point, we’ve created and properly constructed the bc variable.

So far, so good. L1:

12:  ALOAD 1

loads LVA[1] to the stack.

13:  BIPUSH 22

Take 22 byte, extend it to int and push to the stack. The interesting point here is that the stack can contain only 32 bits values(in case you work with 32-bits JVM), which is why 22 is initially extended to int.

14:  BIPUSH 6

The same goes for 6.

15:  INVOKEVIRTUAL com/company/ByteCodeExamples.sum (II)I

Call to Sum(), we have two parameters prepared on the stack.

16:  POP

The returned result from sum() isn’t in use, so we pop it from the stack.

And the last step L2:

19:  RETURN

Clean the stack and exit from main, which is used because main() returns void(in the case of sum(), another ireturn is used)

phew…break

Conclusions.

In this article, I demonstrated the tools and principles you need to understand almost every feature in your JVM language. I encourage you to continue researching and learn how exceptions mechanisms, anonymous functions, streams, and object oriented features are implemented. Now you have enough knowledge to make assumptions about language feature performance considerations when writing your code — and you can use this understanding to make improvements. With a better understanding of your programming environment, you can become a more effective software developer.

Thank you.

links:
The Structure of the Java Virtual Machine
Frames
The Java Virtual Machine Instruction Set (dup, bipush, ireturn, iadd, …)

Cracking JVM code, part III

Written by Alexander Panman