Program Compilation: From Source To Machine Code

The Process of Translating Your Code To CPU-Understood Byte Instructions

Razvan Badescu
Javarevisited
6 min readNov 25, 2023

--

To enable the CPU to execute a developer’s source code, the code undergoes a series of transformations tailored to the specific programming language in use.

Java Compilation Process

To facilitate a clearer understanding of the compilation process, let’s illustrate it below:

Java Code Compilation Process
Java Code Compilation

Now, let’s delve into the intricacies of the Java compilation process:

  • Source Code Compilation (Java Compiler): The Java source code is compiled into platform-independent bytecode. This bytecode is stored in .class files, generated at compile-time, and typically stored on disk.
  • Bytecode Compilation Just-In-Time (JVM): As the program runs, the JVM loads portions of the bytecode from .class files into memory on-the-fly. Subsequently, it translates the loaded portions of bytecode into machine code just before their execution by the CPU. This on-the-fly translation, known as just-in-time (JIT) compilation, enhances performance by dynamically optimizing the bytecode’s execution. Techniques such as reordering instructions, inlining functions, and other optimizations are applied, tailored to real-time execution needs. JIT compilation ensures both platform independence and CPU-specific optimizations.
  • Machine Code Execution (CPU): The machine code, comprised of byte instructions generated by the JIT compilation process, becomes executable at the hardware level. This final transformation represents the low-level manifestation of the original high-level Java source code that is executed by the CPU.

Visualizing The Bytecode

Java provides the “javap” tool for visualizing bytecode. Let’s explore this with an example program that adds two real numbers:

Source Code:

public class Sum {
public static void main(String[] args) {
double a = Double.parseDouble(args[0]);
double b = Double.parseDouble(args[1]);
System.out.println("Result: %.1f + %.1f = %.1f".formatted(a, b, sum(a, b)));
}

private static Double sum(Double a, Double b) {
return a + b;
}
}

Bytecode:

  1. Compile the source code: javac Sum.java
  2. Optionally you can run the program: java Sum 3 7
  3. Print the bytecode from the compiled-time generated .class file:
    javap -c Sum
Compiled from "Sum.java"
public class Sum {
public Sum();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);
Code:
0: aload_0
1: iconst_0
2: aaload
3: invokestatic #7 // Method java/lang/Double.parseDouble:(Ljava/lang/String;)
6: dstore_1
7: aload_0
8: iconst_1
9: aaload
10: invokestatic #7 // Method java/lang/Double.parseDouble:(Ljava/lang/String;)
13: dstore_3
14: getstatic #13 // Field java/lang/System.out:Ljava/io/PrintStream;
17: ldc #19 // String Result: %.1f + %.1f = %.1f
19: iconst_3
20: anewarray #2 // class java/lang/Object
23: dup
24: iconst_0
25: dload_1
26: invokestatic #21 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
29: aastore
30: dup
31: iconst_1
32: dload_3
33: invokestatic #21 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
36: aastore
37: dup
38: iconst_2
39: dload_1
40: invokestatic #21 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
43: dload_3
44: invokestatic #21 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
47: invokestatic #25 // Method sum:(Ljava/lang/Double;Ljava/lang/Double;)Ljava/lng/Double;
50: aastore
51: invokevirtual #31 // Method java/lang/String.formatted:([Ljava/lang/Object;)Lava/lang/String;
54: invokevirtual #37 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
57: return
}

Insight: Visualizing the machine code of a Java program directly isn’t possible. Instead, it is represented by byte instructions, a sequence of 1s and 0s. Each machine code instruction corresponds to a specific bit pattern, intelligible to the CPU for interpretation and execution.

Java Compilers

Java offers several compilers catering to different needs and preferences.

  • javac (Official Java Compiler): Bundled into Java Development Kit (JDK), it’s the default compiler.
  • Eclipse Compiler for Java (ECJ): Integrated into the Eclipse IDE, it receives ongoing updates with each IDE release.
  • Ahead Of Time (AOT) Compiler: Directly translates the source code into native machine code ahead of time, usually during the build or deployment phase. The resulted standalone binary can be executed natively by the underlying hardware.

While various compilers were developed by third parties or in open-source initiatives along the way, it’s noteworthy that the development of some has been discontinued. For current Java development, javac and ECJ stand out as the prominent choices.

JIT Compilation vs AOT Compilation

Just In Time Compilation vs Ahead Of Time Compilation
JIT vs AOT Compilation

JIT Compilation:
In JIT compilation, the JVM reads portions of bytecode from the .class files and translates them into machine code on-the-fly as the Java program runs. This approach allows for adaptive optimizations based on runtime information but introduces an initial overhead.

AOT Compilation:
On the other hand, AOT compilation translates the entire code ahead of time into native machine code, creating a standalone executable. This eliminates runtime translation overhead but may miss some runtime-specific optimizations. So, the CPU has the entire code available in a standalone executable form before execution begins.

Java Virtual Machines

Java Virtual Machines (JVMs) play a crucial role in executing Java applications, and several notable JVMs cater to diverse needs:

  • HotSpot JVM: Developed by the Oracle, stands as the default JVM in Oracle’s JDK and OpenJDK distributions.
  • OpenJ9: Released by IBM in 2017, it’s known for its small footprint and efficient memory management. It finds frequent use in cloud environments.
  • GraalVM:
    – Released in 2018 by Oracle Labs, it provides high performance through Just-In-Time (JIT) compilation, optimizing Java app execution.
    – Its Native Image feature enables ahead-of-time (AOT) compilation of Java apps into native machine code binaries, ensuring faster startup times and reduced memory footprint. This makes it well-suited for microservices and serverless architectures.
    – Supports multiple languages, including Java, JavaScript, Python, Ruby. Its polyglot capabilities empower developers to build applications where different languages are best suited for specific tasks withing the same application.
    – Particularly useful in server-side apps, especially in a polyglot environment where improved startup and execution performance are crucial.
    – Also serves as a tool for research and experimentation, exploring new possibilities in language execution and runtime performance.
  • Zing JVM: Developed by Azul Systems, it has been around since the early 2000s. Focuses on low-latency and high-throughput performance for Java applications, particularly in enterprise settings.
  • Corretto JVM: Announced in 2018 by Amazon, Corretto JVM is Amazon’s distribution of OpenJDK with long-term support. It aims to provide a secure and stable environment for Java applications on Amazon Web Services (AWS).

C and C++ Compilation Process

These languages are known for their low-level features and close-to-hardware capabilities. Compilers for C and C++ generate assembly language code as an intermediary step before producing machine code.

C and C++ Code Compilation Process
C & C++ Code Compilation
  • Compiler: Translates the preprocessed source code into assembly language code. This assembly code is specific to the target CPU architecture such as x86 or ARM.
  • Assembly Code: Typically generated temporarily and passed directly to the assembler. The temporary assembly code may not be stored as a separate file unless explicitly requested for debugging purposes.
  • Assembler: Converts the assembly code into object files stored on disk.
  • Object Files: Contain the compiled machine code for each source file but they also include related information for the functions and data in that specific source file, like symbols and references.
  • Linker: Resolves references between different object files, ensuring that functions and data in one file can be properly linked to those in another. The final result is executable machine code ready to run as a complete program.
  • CPU: Reads the machine code from the memory and execute the program.

Conclusion: A Glimplse into Code Compilation

Understanding the intricate journey of code compilation is a fundamental aspect for developers across various programming languages like Java, C, and C++. The transformation from high-level source code to machine-executable instructions involves a nuanced series of steps, each tailored to the language’s characteristics. While Java optimizes dynamically with bytecode and Just-In-Time compilation, C and C++ take a different route, relying on assembly language as a pivotal intermediary. This comprehensive understanding offers you a panoramic view of your code’s evolution.

Eager to understand how your CPU executes binary instructions and handles parallelism dictated by your programs? Dive into the technical exploration of the Anatomy Of A CPU.

👏If you enjoyed reading this article, don’t forget to give it a round of applause to show your appreciation.

🔔 Follow me to stay updated with more insightful and practical articles.

--

--