Updates on Class Initialization in GraalVM Native Image Generation

Christian Wimmer
Sep 12, 2019 · 10 min read

tl;dr: Since GraalVM 19.0, application classes in native images are by default initialized at run time and no longer at image build time. Class initialization behavior can be configured using the options --initialize-at-build-time=... and --initialize-at-run-time=..., which take comma-separated lists of class names, package names, and package prefixes. To debug and understand class initialization problems, GraalVM 19.2 introduces the option -H:+TraceClassInitialization, which collects the stack trace that triggered class initialization during image generation and prints the stack traces in error messages.

The GraalVM native-image tool enables ahead-of-time (AOT) compilation of Java applications into native executables or shared libraries. While traditionally Java code is just-in-time (JIT) compiled at run time, AOT compilation has two main advantages: first, it improves the start-up time since the code is already pre-compiled into efficient machine code. Second, it reduces the memory footprint of Java applications since it eliminates the need to include infrastructure to load and optimize code at run time.

Native image generation opens up new optimization possibilities: parts of the application can be initialized at image build time, to avoid running the same initialization code over and over again at every application start up. The Feature API allows application code to be run before or during the static points-to analysis that finds the reachable classes, methods, and fields. Objects created during build time are available at run time in the so-called image heap.

A class initializer contains code that is executed before a class is used, i.e., before the first allocation, static method call, or static field access of the class. It is a convenient place to initialize static state once. A class initializer can be written explicitly as a static { ... } block in Java code. But every initialization of a static field (regardless whether the field is final or not) is implicitly converted to a class initializer. For example, when you declared static field as static final Date TIME = new Date(); then the assignment TIME = new Date() is inside the class initializer.

Most class initialization code does not depend on any external input. Running such code at image build time reduces the startup time of an application without any observable impact on the application. The original design of GraalVM Native Image therefore initialized all classes by default during image generation, and allowed the developer to overwrite the behavior when necessary. A blog article from last year described this behavior. However, this default resulted in a bad out-of-the-box behavior: class initialization needed to be configured properly even when just trying out native image. For the GraalVM 19.0 release, we therefore flipped the default: all application classes are initialized at run time by default. Configuring class initialization is now an optimization problem (initializing a class at image built time makes the startup faster) and no longer a correctness problem.

Class initialization is configured using the two command line options --initialize-at-build-time and --initialize-at-run-time. Both take a comma-separated list of fully qualified class names, package names, or package prefixes. The options can be used multiple times on the command line and the order matters. For example, you can use the combination --initialize-at-build-time=my.library --initialize-at-run-time=my.library.package.MyClass to initialize all classes of all packages starting with my.library at image build time, with the exception of the class my.library.package.MyClass that is initialized at run time. Libraries can be shipped with these these command line options in native-image.properties files.

If a class is marked for initialization at image build time, all superclasses are implicitly marked for initialization at image build time too. If a class is marked for initialization at run time, all subclasses are marked for initialization at run time too.

Many class initializers are simple. For example, the class initializer of a plain Java enum type only allocates the enum value instances, which does not depend on any external input. Running such a class initializer at image build time does not lead to any observable side effect (other than faster startup time). The native image generator analyzes each class initializer and automatically initializes such classes during image generation, regardless of the explicit configuration on the command line. In addition, all classes of the JDK are by default initialized during image generation, and the we maintain the list of exceptions to this rule so that the JDK works correctly.

Example

Let’s revisit one of the examples from the previous blog article. It caches the startup time of an application in a static final field, and then prints the startup time and the current time. To run the example, JAVA_HOME must point to a GraalVM 19.2 release. Both the Community Edition and Enterprise Edition work. Note that after downloading a GraalVM release, the native-image tool needs to be installed using gu install native-image.

Compiling and running this example on a standard Java VM prints two time stamps that are the same or very close together:

The native image for this application now works correctly out-of-the-box, without any additional command line options:

The class initializer of all application classes, including the class Startup, are executed at run time. Therefore, executing the same binary an hour later still reports the correct startup time:

Now we want to improve the startup performance of our application and initialize classes during image generation. We do that by specifying that all classes with the package prefix org.graalvm.example should be initialized at image build time:

We already see that the startup time and current time are suspiciously far apart. And indeed, the startup time is now fixed in the executable. Running it again after an hour prints the same startup time:

We need to manually exclude the class Startup from initialization at build time and instead initialize it at run time to get a correctly working application again:

Tracing Class Initialization

Understanding and tracing why a class is initialized in Java is not easy. A class initializer can trigger recursive initialization of many other classes. First, the complete superclass hierarchy is initialized. Then some (but not all) interfaces that the class implements are initialized: only interfaces that define default methods are initialized, while interfaces without default methods can remain uninitialized even if interface methods are invoked. Then the actual class initializer is invoked. When it accesses static elements of other classes or allocates instance of other classes, initialization of those classes is triggered recursively too. The situation is complicated even more because class initialization can be cyclic (a class initializer can indirectly depend on itself) and class initialization can be initiated concurrently in multiple threads.

As a result, running initialization code during image build time (either via a Feature or by initializing some classes via --initialize-at-build-time) can trigger initialization of unexpected other classes. If such a class is marked for initialization at run time, this is a conflict that the native image tool cannot resolve, and so image generation fails.

This occurs if we modify our example a bit and cache the value of Startup.TIME also in the main class in the field CACHED_TIME:

The native image build fails with an error:

In this simple example, it is easy to see that the class initializer of HelloCachedTime causes the problem: it runs at image build time because of --initialize-at-build-time=org.graalvm.example and triggers initialization of the class Startup because it reads the static field Startup.TIME. But in larger applications, these problems are difficult to debug. Therefore, we added a new option in GraalVM 19.2 to trace the reason of class initialization: -H:+TraceClassInitialization. When adding this option, the error message now explicitly names the class HelloCachedTime as the culprit of the problem:

There is no stack trace printed because the class initializers directly invoke each other. But we will later see a larger example where a helpful stacktrace is included too. To solve the problem for our small example, we need to add the option --initialize-at-run-time=org.graalvm.example.HelloCachedTime.

Example: Tracing the Class Initialization of Netty

The Netty framework for efficient network I/O is the foundation of many modern Java applications, and used for example by many microservice frameworks. We showed a while ago how Netty can be used in a native image, and many necessary substitutions and configuration files are now part of the Netty release. One configuration file that is part of the release specifies that all classes with the package prefix io.netty are initialized at image build time, with a few exception of classes that must be initialized at run time because the class initializers depend on external state or allocate objects that cannot be in the image heap.

A simple “Hello, world” for Netty is in our repository https://github.com/cstancu/netty-native-demo. Running mvn clean package produces a single self-contained jar file with the application and the necessary parts of Netty. Building the native image is as easy as running $JAVA_HOME/bin/native-image -jar target/netty-svm-httpserver-full.jar because all the necessary options for the native image tool are already contained in configuration files.

We recently discovered that one more class needs to be initialized at run time: The class initializer of the class PooledByteBufAllocator queries the number of available processors and the maximum heap size. Running this class initializer at image build time queries and preserves the values seen at image build time, i.e., the processor count and the heap size of the image generation process. We need to initialize this class at run time using --initialize-at-run-time=io.netty.buffer.PooledByteBufAllocator. Unfortunately, just adding that parameter then leads to an error at image build time:

The error message tells us that the class PooledByteBufAllocator got instantiated at image build time and the instance is reachable from the constructor of the class DefaultChannelConfig. Instantiating the class also means initialization of the class at image build time, which is in conflict with our explicit option to initialize the class at run time. But the error message does not yet tell us why the instantiation happened, only that it happened. The new option -H:+TraceClassInitialization fills this gap:

Now the error message tells us that the object was allocated by the class initializer of ByteBufAllocator. The stack trace shows that class initialization of the class ByteBufAllocator trigger initialization of the class ByteBufUtil and then PooledByteBufAllocator. So instead of just one class, we need to register three classes for initialization at run time: --initialize-at-run-time=io.netty.buffer.PooledByteBufAllocator,io.netty.buffer.ByteBufUtil,io.netty.buffer.ByteBufAllocator. With this option, the image building succeeds again and leads to the intended class initialization behavior.

Implementation Details of the Tracing

Java does not provide a standard way to trace class initialization and to produce the stack trace as shown above. Since the native image tool is just a Java application running on the Java HotSpot VM, modifying the VM to store the traces is also not a feasible option. Instead, we use bytecode instrumentation: The class initializer of every application class is instrumented to store the stack trace when it is invoked. For application objects, we store the stack trace for every object allocation. Since this adds a noticeable time and memory overhead, especially when many application objects are allocated during image generation, the option TraceClassInitialization is not turned on by default.

Summary

Changing the default class initialization behavior was an intrusive, but as we believe necessary, change shortly before the GraalVM 19.0 release. The default now provides a good out-of-the-box experience for users, while frameworks such as Netty can improve startup time by providing command line options via a native-image.properties file. The new option -H:+TraceClassInitialization in GraalVM 19.2 simplifies debugging of problems that arise when initializing some classes at image build time and some classes at run time. Class initialization was also a significant topic at our talk this year at the JVM Language summit. You can watch the recording of the talk.

graalvm

GraalVM team blog - https://www.graalvm.org