Instant Netty Startup using GraalVM Native Image Generation

In this article we demonstrate how you can achieve instant startup for Netty, a non-blocking I/O Java networking framework. We do this by compiling the Netty application into a native executable with GraalVM. First we discuss why we think this is important, then we detail what steps are necessary to enable ahead-of-time compilation of Java bytecode, and finally we show how to do it on a sample Netty application. We show how to configure the GraalVM native image tooling to work with reflection, incomplete classpath issues, and unsafe memory access.

Note: This article has been updated to reflect the use of delayed class initialization, a feature introduced in GraalVM 1.0.0 RC6.

Why Create a Netty Native Image?

The GraalVM native-image tool enables ahead-of-time (AOT) compilation of Java applications into native executables or shared libraries. While traditionally Java code is just-in-time (JIT) compiled at run time, AOT compilation has two main advantages: First, it improves the start-up time since the code is already pre-compiled into efficient machine code. Second, it reduces the memory footprint of Java applications since it eliminates the need to include infrastructure to load and optimize code at run time. There are additional advantages such as more predictable performance and less total CPU usage.

We call the technology behind the GraalVM native-image tool Substrate virtual machine (VM) because in addition to your application code and its dependencies, the target executable contains components such as a garbage collector. To achieve the goals of a self contained executable, the native-image tool performs a static analysis to find ahead-of-time the code that your application uses. This includes the parts of the JDK that your application uses, third party library code, and the VM code itself (which is written in Java too). Therefore, at run time the native executable only depends on your system’s native libraries, like libc, but even this dependency can be removed if you choose to fully link the native dependencies statically.

Building Netty based native images allows you to use a modern networking framework in your code with instant start-up and low memory footprint.

This all sounds good, but what’s the catch?

To efficiently compile a managed language like Java ahead-of-time, you need to carefully choose which of the language features you want to support. We employ a closed-world analysis to discover all reachable code. This means that all code needs to be available at compile time, so dynamic class loading at run time is not possible. The target executable does not contain machinery to load and compile Java bytecode that was not seen ahead of time. Adding those capabilities to Substrate VM would defeat the purpose of having a thin-layer VM. In addition, it means that the analysis needs to be able to reason about your code independently of runtime information, so features like reflection need a special treatment. We detail some of this restrictions in this article and show how they can be addressed, as they apply to the Netty use case, but you can find a complete list of limitations in our GitHub repo.

First let us introduce an example application to support the discussion.

Say Hello!

Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients.

More concretely, Netty is an easy way to build high-performance network applications such as protocol servers and clients based on Java non-blocking I/O. It supports many protocols such as FTP, SMTP, HTTP(S) and WebSockets, and it gives you high throughput and low latency.

For the purpose of this demo we are using the HttpHelloWorldServer example from the Netty examples repository. It is a simple HTTP server that responds to each request with Hello World. When run on the JVM, the server needs more than half a second before it is ready to process requests. Building the application with the native-image tool produces a native executable that is less than 9 MB in size; has a significantly improved startup time: the server is ready to accept requests in a matter of milliseconds; and has a much lower memory footprint: it uses about 7 times less memory.

If you would rather skip the discussion and want to see the example in action first, you can jump to the end of the article.

The application has three main components: a server bootstrap, a channel initializer, and a channel handler. The embedded code snippets are a simplified version. You can download a complete version that you can run from this GitHub repository.

The server bootstrap initializes the application and sets up the connection:

The channel initializer is invoked every time a new connection is opened and sets up the channel processing pipeline:

The channel handler reads the request and sends back the response:

When run, the application will prompt you with:

Open your web browser and navigate to http://127.0.0.1:8080/

Now let’s see what is needed for this to run as a native executable.

Making Netty Compatible with Native Image Generation

There are three main areas that need to be addressed to make Netty, and other libraries, compatible with native image generation. These are: reflection, unsafe memory access, and incomplete classpath. Each of them is detailed bellow.

Reflection

Reflection is a very powerful technique in Java. It allows a Java program to examine and modify its run-time behavior. While powerful, reflection should not be used indiscriminately. Even on the JVM it can impede some optimizations which can lead to performance overhead. In general, where possible, it is preferable to perform an operation without using reflection.

Netty uses reflection for extensibility and ease of configuration. It allows the user to configure an I/O channel by providing a class object, either one of the Netty built-in channel classes or an user-defined one. For this task it uses the ReflectiveChannelFactory:

In our example the ServerBootstrap is initialized with a NioServerSocketChannel channel class:

We need to explicitly mark reflectively accessed classes, methods, and fields at build time. This enables the static analysis to reason about the dynamic behavior of reflective operations. We specify a reflection configuration file to the native-image command using -H:ReflectionConfigurationFiles=/path/to/reflectconfig. You can read more about reflection support here. For our example we need the following configuration:

The reflection configuration specifies that the NioServerSocketChannel should be made available for reflective instantiation using the no-parameter constructor.

We write this article with the assumption that the problematic code resides in a third party library that you cannot easily modify, e.g., a library that depends on Netty. In our small example, reflection could be easily avoided by writing our own channel factory:

The mechanism to register elements for reflective access is part of the SDK of GraalVM, released as part of the graal-sdk.jar.

Unsafe memory access

The Java class sun.misc.Unsafe is an API for low-level programming. It can be used to read and write data at arbitrary memory addresses. This API was originally intended to be only used by core Java classes. This intention is reflected in the API design: the only constructor is private and the only available instance is a private singleton. The default access to the singleton instance is through the Unsafe.getUnsafe() method, whose use is restricted to classes loaded by the bootstrap class loader, i.e., only Java core classes. However, although discouraged, there are ways to get access to the unsafe API from any application and many libraries do so. Furthermore, in an effort to encapsulate most internal APIs, starting with Java 9 the private nature of the sun.mic APIs is made more apparent by moving them to the jdk.unsupported module. For an in-depth discussion of sun.misc.Unsafe you can, for example, read this article. You should certainly belief to the Chief Architect of the Java Platform Group at Oracle, who strongly discourages the use of sun.misc.Unsafe. But since many existing applications use it, we support it in native executables.

Unsafe memory access through the sun.misc.Unsafe API is allowed in native executables, but field offset, array base offset, array index scale, and array index shift values need to be re-computed. These values are usually computed in the static initializer of a class and stored in static final fields. Static initializers are executed during build time, i.e., when the native-image tool runs. This means that the static fields store the field offsets computed by the JVM. However, Substrate VM uses a different object layout than the JVM, so using the values directly would access wrong memory locations. That leads to undefined behavior at run time. If you are lucky, your application crashes with a segmentation fault, if you are unlucky it just computes the wrong result.

We automatically re-compute most raw offset values computed in static initializers and stored in static final fields. The raw offset re-computation is straightforward offset arithmetic, but first the fields, arrays, etc. used in unsafe operations need to be detected. This is carried out by a static analysis at image build time. The analysis relies on the fact that most uses of unsafe operations follow a common pattern. It detects static initializers with the following patterns:

For object field offset:

For array base offset:

For array index scale:

For array index shift with the scale stored in a field:

For array index shift with the scale not stored:

As long as your code is written this way, we can successfully detect and adapt all uses of unsafe operations. However, it happens that some code, even in the JDK, does not follow this common patterns. Although we could improve our analysis to include more patterns and use an inter-procedural analysis that can look beyond the code in the static initializers, we noticed that our current approach covers the majority of the unsafe operations uses. For the rest of the cases the analysis will warn you that an automatic re-computation was not possible. For these cases we have a mechanism that allows you to manually adapt the unsafe operations.

For the small example in this article there were 94 instances of automatic re-computations for unsafe operations (this includes code in both JDK and the Netty library), and only 3 that did not follow the common code patterns and required manual configuration. Those 3 are in the Netty library: io.netty.util.internal.PlatformDependent0.ADDRESS_FIELD_OFFSET, io.netty.util.internal.CleanerJava6.CLEANER_FIELD_OFFSET and io.netty.util.internal.shaded.org.jctools.util.UnsafeRefArrayAccess.REF_ELEMENT_SHIFT.

Note: when we say “Netty library” we refer to all the code that comes packaged with Netty, which includes the shaded JCTools library. For this article we used Netty 4.1.24.Final, the latest stable version at the time of writing, which imports JCTools 2.1.1.

To manually recompute the value of raw offsets we use the Substrate VM substitution mechanism. More information about substitutions is in the next section. For now here is the configuration needed to manually adapt those 3 uses of unsafe operations:

To make the native image generation for Netty work out-of-the box, these 3 instances of unsafe use could be refactored to the common code patterns.

Incomplete classpath

The static analysis scans the classpath and tries to load all referenced classes eagerly ahead-of-time. On the JVM you can execute programs with an incomplete classpath, i.e., your code might reference some classes that are not available at run time. This is not a problem as long as your code does not actually try to use that code. If you try to use a missing class, the JVM throws an exception. Sometimes applications rely on this behavior to test if an existing library is on the classpath or not. An attempt to load a class from said library is made and on failure the exception is caught and execution continues.

In Netty, the InternalLoggerFactory is written in such a way. The logging library is chosen depending on the logging framework that is available on the classpath:

For Substrate VM, this coding pattern is problematic since we eagerly load and parse all run-time reachable code. Trying to use this code with Substrate VM throws a java.lang.NoClassDefFoundError during bytecode parsing, i.e., at build time.

There are generally three approaches to cope with this kind of problems when building a native image: patch the problematic code using the substitution mechanism, run the native-image tool with the --report-unsupported-elements-at-runtime option, or modify the source code.

The first approach uses the substitution mechanism of Substrate VM: we can rewrite the InternalLoggerFactory.InternalLoggerFactory() method to always return JdkLoggerFactory.INSTANCE, the default logging configuration that is shipped with Netty:

The method annotated with @Substitute replaces the method with the same name and signature in the class specified by the annotation @TargetClass. The new implementation of newDefaultFactory() no longer references non-existing classes, when the method is called it always returns the JdkLoggerFactory object.

The second approach uses the option --report-unsupported-elements-at-runtime for the native-image tool. It instructs the native image builder to ignore all type resolution problems at build time. These problems are then reported at run time by throwing an exception, similar to the behavior of the JVM. In the case of Netty the original logging instantiation code will successfully execute at run time, i.e., it will try to load the Slf4JLoggerFactory and Log4JLoggerFactory classes, fail with an exception, and use the JdkLoggerFactory. The option --report-unsupported-elements-at-runtime is good for prototyping because it allows you to build native executables without worrying about many issues at first. But we discourage using it in production: not using the option gives you the peace of mind that your native executable is complete and will not fail spuriously at run time.

The third approach is to modify the source code and move the problematic code to a static initializer. Static class initializers are executed during image building, on the Java VM executing the image builder. Therefore, code reachable only from static initializers is not subject to this restriction as it runs on a standard JVM. Running the static initializers during the build process allows us to initialize all classes and freeze the heap. The heap is then included in the executable to reduce the start up time. For the example in this article, this can be achieved by introducing a static field which causes all the logging initialization to be executed at image build time:

Then we can simply use the value of this field to configure the server:

At run time, the server is initialized with a constant that was computed ahead-of-time and included in the native image heap.

Delayed class initialization

All classes determined as reachable by the static analysis are initialized at image build time, by default. Therefore, each native image contains not just code, but also an initial heap that serves as the starting point of the Java heap at run time. This allows us to skip class initialization at run time, which is crucial for fast startup.

While executing class initializers during image generation has clear advantages, there are some situations where this approach is not feasible. For example static initializers that start application threads that continue to run in the background of the application, load native libraries, open files or sockets, or allocate C memory cannot be run at image build time.

All these examples result in native resources created during image generation which are no longer available at image run time, and accessing them would lead to undefined behavior. To mitigate such issues, initializer execution can be delayed to run time for certain classes. The option --delay-class-initialization-to-runtime takes a comma-separated list of classes, and implicitly all of their subclasses, that are initialized at run time instead of image build time. You can read more about this feature in a separate article.

For our example application we need delayed class initialization for the io.netty.handler.codec.http.HttpObjectEncoder class because it adds a java.nio.DirectByteBuffer to the image heap. A direct ByteBuffer has a pointer to unmanaged C memory, and C memory from the image generator is not available at image run time.

Substitution mechanism

To overcome the restrictions that Substrate VM imposes we devised a mechanism that allows us to compile third party code without modifying the source code. This is especially important to be able to support an unmodified JDK. This mechanism is generically called substitutions because what it essentially does is substituting snippets of the target code with a version compatible with Substrate VM.

During the build process Substrate VM compiles Java bytecode via the Graal compiler. This enables us to apply the substitutions on the Graal IR level, i.e., the Graal graphs. Thus we do not have to deal with patching source code or rewriting bytecodes. The substitution mechanism can alias target classes, methods, fields, and alter them as we saw previously with the raw offset fields. The substitutions can either be specified via annotations or via a JSON configuration file. When specified via annotations the substitution classes need to be added to the classpath to be picked up by the image builder. The best place to start learning about the substitution mechanism is from the Java documentation for the com.oracle.svm.core.annotate.TargetClass annotation. It is important to note here that the substitution mechanism cannot alter static initializers since we execute the static initializers at image build time and don’t actually compile them. It can however alter the value of the static fields, i.e., the value that will be written in the executable image.

Unlike reflection support, the substitution mechanism is not part of the API. However you can still use it in your development environment as it is released as part of svm.jar. We make this distinction because, although powerful, we do not want people to rely on this mechanism. We think that a better way to deal with the native image generation restrictions in the long run is to adapt the source code. This allows developers to evolve their code without worrying about yet another dependency.

Now, let’s run the example!

The complete example is hosted on GitHub. To set up your development environment you first need to download GraalVM. Either the Comunity Edition or the Enterprise Edition works for the purpose of this example. The GraalVM download contains a full JVM plus few other utilities like the native-image tool. Then you need to set your JAVA_HOME to point to GraalVM:

$ export JAVA_HOME=<graalvm-download-location>/graalvm-1.0.0-rc6

Then you need to add GraalVM to your path:

$ export PATH=$JAVA_HOME/bin:$PATH

Now you can run the native-image tool:

$ native-image --help
GraalVM native-image building tool
This tool can be used to generate an image that contains ahead-of-time compiled Java code.
...

Alternatively, you could build native-image from source following the quick start guide.

For compilation native-image depends on the local toolchain, so please make sure: glibc-devel, zlib-devel (header files for the C library and zlib) and gcc are available on your system. On the OS that this demo was tested, Ubuntu 16.04, the following command was required to install zlib-devel, the rest of the dependencies being installed out-of-the-box:

$ sudo apt-get install zlib1g-dev

The example is built with Maven. But before we build we need to install svm.jar in the local Maven repository since the project’s pom.xml file depends on it. The svm.jar library contains all the code needed to compile the substitutions required for Netty.

$ mvn install:install-file -Dfile=${JAVA_HOME}/jre/lib/svm/builder/svm.jar -DgroupId=com.oracle.substratevm -DartifactId=svm -Dversion=GraalVM-1.0.0-rc6 -Dpackaging=jar

The other required library, graal-sdk.jar, is automatically added to the classpath when you run the javac command shipped with GraalVM.

Now you can build with:

$ mvn clean package

This will create a jar file with all dependencies embedded.

Netty on the Regular JVM

On HotSpot we can run as usual using the java command:

$ java -jar target/netty-svm-httpserver-full.jar

which should prompt:

Open your web browser and navigate to http://127.0.0.1:8080/

If you open the address in your browser you should be greeted with the familiar Hello World.

But how quickly is the server ready to process your requests? To answer this we introduce a System.exit(0) just after the message prompt:

After rebuilding the jar we use the time command to get the real time it takes the server to get ready:

$ time java -jar target/netty-svm-httpserver-full.jar
Open your web browser and navigate to http://127.0.0.1:8080/

real 0m0.544s
user 0m0.404s
sys 0m0.040s

To measure the memory footprint we use the /usr/bin/time command (which in bash is different than the time command):

$ /usr/bin/time -f "\nmaxRSS\t%MkB" java -jar target/netty-svm-httpserver-full.jar 
Open your web browser and navigate to http://127.0.0.1:8080/
maxRSS 64224kB

Netty with GraalVM Native

To build the native image we use the native-image tool:

$ native-image -jar target/netty-svm-httpserver-full.jar -H:ReflectionConfigurationResources=netty_reflection_config.json -H:Name=netty-svm-http-server --delay-class-initialization-to-runtime=io.netty.handler.codec.http.HttpObjectEncoder
Build on Server(pid: 29456, port: 26681)
classlist: 194.15 ms
(cap): 468.11 ms
setup: 626.51 ms
(typeflow): 3,709.95 ms
(objects): 2,402.43 ms
(features): 42.58 ms
analysis: 6,274.84 ms
universe: 141.76 ms
(parse): 310.85 ms
(inline): 658.15 ms
(compile): 1,782.15 ms
compile: 3,055.74 ms
image: 484.51 ms
write: 132.26 ms
[total]: 10,936.45 ms

This creates an executable file that is less than 9 MB in size:

$ ls -Gg --block-size=k netty-svm-http-server
-rwxrwxr-x 1 8330K May 8 23:33 netty-svm-http-server

We can now run the executable:

$ ./netty-svm-http-server
Open your web browser and navigate to http://127.0.0.1:8080/

To measure the start-up time we insert System.exit(0) again and rebuild both the jar and the native image. Then we can run it again with the time command:

$ time ./netty-svm-http-server
Open your web browser and navigate to http://127.0.0.1:8080/

real 0m0.008s
user 0m0.008s
sys 0m0.000s

We can observe a significant reduction in start-up time: from more than half a second on the JVM to a few milliseconds for the native executable.

Again, we measure the memory footprint using the /usr/bin/time command:

$ /usr/bin/time -f "\nmaxRSS\t%MkB" ./netty-svm-http-server 
Open your web browser and navigate to http://127.0.0.1:8080/
maxRSS 8712kB

We can observe that the native executable uses about 7 times less maximum RAM when compared with the JVM.

Conclusions

In this article we looked at a small Netty application and used GraalVM to build a native executable of this application. We presented the restrictions imposed by the native-image tool, explained why GraalVM currently has them, and showed how to overcome them. We showed how to configure the GraalVM native-image tool to work with reflection, incomplete classpath issues, and unsafe memory access. As a result, we got a native image of a Netty app that starts instantly.

While we are actively working on lifting some of the restrictions for the native images there is a limit to how far we can go if we want to preserve the fast startup/low footprint benefits. In the meantime, we welcome library developers and maintainers to try and make their code compatible with GraalVM native image tooling, while we evolve it into a flexible, easy to use platform.