Cracking JVM code, part II

Alexander Panman
Wix Engineering
Published in
5 min readNov 28, 2021

First experiment

Photo by Matt Antonioli on Unsplash

Cracking JVM code, part I (Onboarding)
Cracking JVM code, part III (JVM runtime data structures)

So, after some low-level intuition, let’s use existing tools to view the whole class.
The class File Format Java SE Specification chapter is extremely useful if you want more details for part II.

Maybe the most popular tool to start is javap, which should be part of your JDK:

$ javap -c -verbose ByteCodeExamples

I’ll go over some interesting points here:

2:  minor version: 0
3: major version: 57

This part is used by JVM for version validation. If you try to run a class compiled with JAVA 8 on JAVA 7, you will get an UnsupportedClassVersionError exception — and this is how JVM knows about it (you can cheat a little with javac parameters, but let’s keep it simple for now).

4:  flags: (0x0021) ACC_PUBLIC, ACC_SUPER
5: this_class: #7 // com/company/ByteCodeExamples
6: super_class: #2 // java/lang/Object

The next interesting point here: ACC_PUBLIC declares that the class can be accessed from outside its package. ACC_SUPER exists only for backward compatibility with code compiled by older compilers.

Lines 5 and 6 here’s something familiar: ByteCodeExamples and java/lang/Object. Of course, we all know that every class in Java extends the base superclass, Object. We can see that in the current example.

Numbers #7, #2 are the references from Constant pool(see line 8 and more about Constant pool below), while javap calculates the values for us. Let’s take super_class: #2 as an example:

#2 Class — reference –> #4 Utf8 with value java/lang/Object

That’s it — very simple.

Continuing:

7:  interfaces: 0, fields: 0, methods: 2, attributes: 1

Why two methods? Well, the first is Constructor and the second is our sum(). You can easily see it when running javap command with -p parameter:
$ javap -p ByteCodeExamples

public class com.company.ByteCodeExamples {
public com.company.ByteCodeExamples();
public static int sum(int, int);
}

Attributes: whether class has inner classes or deprecated or something else. You can read more in Attributes section)
Interfaces: number of interfaces that class implements
Fields: number class fields

The next section is Constant pool: 8:  Constant pool:
9: #1 = Methodref #2.#3 // java/lang/Object.”<init>”:()V
10: #2 = Class #4 // java/lang/Object
11: #3 = NameAndType #5:#6 // “<init>”:()V
12: #4 = Utf8 java/lang/Object
13: #5 = Utf8 <init>
14: #6 = Utf8 ()V
15: #7 = Class #8 // com/company/ByteCodeExamples
16: #8 = Utf8 com/company/ByteCodeExamples
17: #9 = Utf8 Code
18: #10 = Utf8 LineNumberTable
19: #11 = Utf8 sum
20: #12 = Utf8 (II)I
21: #13 = Utf8 SourceFile
22: #14 = Utf8 ByteCodeExamples.java

According to the specifications chapter 2:

A run-time constant pool is a per-class or per-interface run-time representation of the constant_pool table in a class file (§4.4). It contains several kinds of constants, ranging from numeric literals known at compile-time to method and field references that must be resolved at run-time. The run-time constant pool serves a function similar to that of a symbol table for a conventional programming language, although it contains a wider range of data than a typical symbol table.

Following that straightforward description, we also know how to resolve references from the Constant pool (remember example with references #7, #2 from lines 5 and 6?).

Let’s stop on these lines:

19:  #11 = Utf8 sum
20: #12 = Utf8 (II)I

As you probably recognize, this is the signature of our function with the name sum, which receives two integer (II) parameters and returns an integer (I) result.

Others types from Table 4.3-A. Interpretation of field descriptors:

So,

9:  #1 = Methodref #2.#3 // java/lang/Object.”<init>”:()V

is a reference to the Object class constructor called “<init>” in JVM language, receiving no arguments () and returning type V for void.

And finally, regarding the functions code, I’ll describe our Sum(), but encourage you to also see the constructor
public com.company.ByteCodeExamples();
by yourself.

35:  public static int sum(int, int);
36: descriptor: (II)I
37: flags: (0x0009) ACC_PUBLIC, ACC_STATIC
38: Code:
39: stack=2, locals=2, args_size=2
40: 0: iload_0
41: 1: iload_1
42: 2: iadd
43: 3: ireturn
44: LineNumberTable:
45: line 5: 0

The first three lines are obvious.
stack=2, locals=2, args_size=2
stack = 2: defines the maximum stack size when the function is called. In our case, we need to push the two integer arguments a and b to the stack (we’ll get into the definition of a stack later).
locals=2: represents the local variables to be initialized
args_size=2: denotes the function parameters, both referring to a and b.

Now, go ahead, change some lines, remove and add code, compile and javap the result and look for differences. Practice is the best way to understand theory :)

Pause time:

Photo by Tai Bui on Unsplash

Ok, let’s run the first experiment:

I’ve added class_sum(), with the primary difference that class_sum() isn’t static and I’ve added a new variable c. After compiling and using javap:

Now you can recognize the difference:

  • no ACC_STATIC flag
  • locals=4; a, b, c — which is the fourth? It’s this. Because class_sum() isn’t static, it receives this as the first parameter reference to call the function. This is particular to Python developers, which is important to keep in mind. No magic here ;)
  • The same goes for args_size=3
11: line 10: 0

is just a line number in my IDE(for debug usage)

Let’s look at the code section again.

Code:
stack=2, locals=2, args_size=2
0: iload_0
1: iload_1
2: iadd
3: ireturn
LineNumberTable:
line 5: 0

Here I want to show the same code compiled for x86 assembly rather than JVM (if you’ve never written C/C++/ASM code, you can jump to JVM runtime data structures).

For this purpose, I’ve written a simple C program with the same method:

And this is how the sum() assembly code looks (compiled with gcc -O):

_sum:
0000000100000fa0 pushq %rbp
0000000100000fa1 movq %rsp, %rbp
0000000100000fa4 leal (%rdi,%rsi), %eax
0000000100000fa7 popq %rbp
0000000100000fa8 retq
0000000100000fa9 nopl (%rax)

The main difference attracting my attention — no registers in JVM. As a result, there are also no operations similar to mov(that’s super popular in asm). And the reason is in JVM runtime data structures. So, what does this mean?

→ Cracking JVM code, part III (JVM runtime data structures)

links:
The class File Format Java SE Specification chapter
Fields description
Run-Time Constant Pool
Attributes

--

--