Improving performance of GraalVM native images with profile-guided optimizations

Jaroslav Tulach
Aug 29, 2019 · 8 min read

GraalVM Native Image tool rightfully attracts a lot of attention as it offers significant improvements in terms of startup speed and overall memory usage. However, if you create some benchmarks to evaluate peak performance you may observe that the native image sometimes doesn’t offer better throughput too.

In this article we look at pros and cons of GraalVM Native Image. We show an easy way of generating PGO profiles for native images introduced in GraalVM 19.2 that significantly improves throughput of generated native images for known workloads.

Image for post
Image for post
A guide on how to use profile-guided optimizations for GraalVM native images.

Benefits of Native Image

The Native Image takes the bytecode of your application and compiles it to a native executable ahead of time. As a result one can program in any JVM language: Java, Scala, Kotlin — and get a single, self-contained executable file as output. Single file has many benefits: It can be easily copied from one system to another just by itself — it contains all the application code as well as necessary runtime support, like the garbage collector for example. Single file gets loaded and is ready to run — no need to seek for various JAR, properties & other miscellaneous files and wait for them to open, load and initialize. The file generated by Native Image gives us instant startup. In addition to that the Native Image tool is able to capture a snapshot of an application memory — e.g. you can bring your system into ready to run state and when the generated native executable is started it continues exactly from where it was. This eliminates repetitive initialization and makes the startup time even more instant.

Another benefit of ahead of time compilation is lower memory consumption. A typical JVM keeps enormous amount of metadata in memory in addition to the JIT generated native code. These metadata are needed to be able to de-optimize at almost any moment. Nothing like that is needed in case of Native Image — the generated code covers all the possible code paths and never de-optimizes. The native code is known to be enough and all the metadata can be dropped when the native executable is being generated.

In spite of all the above goodies, the Native Image fulfils the most important aspects of a JVM — one can use a language of own choice — be it Java, Scala, Kotlin, etc. One can benefit from all the development tools available for the JVM. One can use the strong concurrency guarantees of a JVM and one doesn’t need to care about garbage collection. The rich ecosystem of JVM full of useful libraries, tools and frameworks awaits to be compiled ahead of time.

Trade-offs of Native Image

Obviously the native executable can only run on a single platform. If you generate the image for 64-bit Linux, it only runs on Linux. If for Mac, it runs on Mac. If the executable is generated for Windows, it is going to run only on Windows. The portability is restricted compared to classical JAR file. Another limitation is caused by missing metadata during runtime. The previous section mentioned missing metadata as a benefit, but it also has its cost. Since by default native image doesn’t retain information about classes and methods, one’s ability to perform reflection is limited. The reflection is still possible, but it has to be configured and compiled into the native executable. As there are many Java frameworks that rely on reflective access, getting them run on Native Image may require additional configuration. Yet another restriction comes from the fact that the Native Image runtime may not support all features of Java. Running Swing UI toolkit may not be possible as it is too dynamic. On the other hand, Native Image successfully managed to execute Javac, Netty, Micronaut, Helidon and Fn Project — all large and nontrivial applications running on top of JDK.

The last drawback associated with the ahead-of-time compilation is speed. What? I thought Native Image starts faster! Well, it does start significantly faster than similar JVM application, but at the end, when the application runs for a long time, the just-in-time compiler can actually outperform the AOT one. As the helidon.io team puts it:

“On the other hand, everything is always a tradeoff. Long running applications on traditional JVMs are still demonstrating better performance than GraalVM native executables due to runtime optimization. The key word here is long-running; for short-running applications like serverless functions, native executables have a performance advantage. So, you need to decide yourself between fast startup time and small size (and the additional step of building the native executable) versus better performance for long-running applications.”

Now we are getting to the main topic of this post. Let’s take a look why the peak performance of AOT compilation is slower and then let’s speed it up!

There is no Free Lunch!

On the other hand, there is no need for initial interpretation of the bytecode. There is no need for deoptimizations and there is no support for random reflection poking around your classes. As a result for short-lived application native image starts faster, overall uses less memory. The benefits are huge, however everything comes at some cost. There is no free lunch. Or is it?

Improving Peak Performance of Native Image

Shape.java

The above program introduces the Shape interface and its four implementations: Circle, Square, Rectangle and Triangle. The base interface defines area() method and each of the geometric classes overrides it and provides different implementation, suitable for its shape. Those who know how object oriented languages are implemented can already smell the problem. Right, if we create an array of shapes and go through it, the code will have to be ready for virtual method dispatch. Let's do it:

computeArea method

The array of all shapes can contain any instances and as such the call shape.area() has to be able to call any of the actual methods. That's usually done with a virtual method table associated with each geometric class. Find out the current shape is Circle, then lookup the actual implementation of Circle.area() method and call it. Doing this requires a bit of calculation. To demonstrate that let's generate a huge array of random objects and measure how much time invoking the computeArea method takes:

the main method which generates shapes and measures time

If you put all the above code into file Shape.java (do it in an empty directory), you can compile it with GraalVM's Native Image tool. To get started download GraalVM enterprise edition as well as GraalVM Enterprise Edition Native Image tool. Unpack GraalVM and use its gu tool to install (gu install --file native-image-installable-svm-svmee-*-19.2.0.jar) the bin/native-image utility. Then you can:

$ /graalvm-ee-19.2.0/bin/javac Shape.java$ /graalvm-ee-19.2.0/bin/native-image Shape$ ls -1
graalvm-ee-19.2.0
shape
'Shape$Circle.class'
'Shape$Rectangle.class'
'Shape$Square.class'
'Shape$Triagle.class'
Shape.class
Shape.java

A shape executable has been generated. When you run it, it is going to be completely standalone, start fast, require little memory, but it won't be optimized. Try it:

$ ./shape 15000 43243223423 30 square rectangle
last round 35 ms.
$ ./shape 15000 43243223423 30 triangle circle
last round 34 ms

The actual execution time may vary depending on the speed of your computer. The absolute values do not matter much, we just want to make the execution faster. Let’s train our program to be ready for square and rectangle. To do so we need to capture the data about the actual program execution. Let’s thus generate the PGO data.

$ /graalvm-ee-19.2.0/bin/java -Dgraal.PGOInstrument=shape.iprof Shape 15000 43243223423 130 square rectangle

The shape.iprof file is generated once the execution is over. If you inspect its content, you may find out there is a reference to Shape$Square, but there is no reference to Shape$Circle. Of course - we've been training the program for square and rectangles, not circles! The fact that Shape$Circle is missing in the shape.iprof file signals that the training was successful. Let's now use the data and regenerate our native image:

$ /graalvm-ee-19.2.0/bin/native-image --pgo=shape.iprof Shape$ ./shape 15000 43243223423 30 square rectangle
last round 25 ms.

Speedup! Instead of 35ms we can now execute the trained program in 25ms. Just by training it, recording the compiler decisions and using them to guide the compilation, we have sped up our program by almost 30%.

Note that this result is still not exactly on par with running with a warmed up JIT compiler. If we run the same code on with a JIT compiler we still see better results.

 $ java Shape 15000 43243223423 130 square rectangle
last round 17 ms.

We’re working on enabling better optimizations in the GraalVM compiler used ahead-of-time, so the performance of native images should improve further in the future.

If you’re wondering whether the PGO optimization numbers translate well to the real world applications, you can try profile-guided optimizations on some larger project, for example on the Micronaut demo application for GraalVM. From our initial tests PGO shows good results there. We plan to expand on this topic in further articles.

Of course, the speed up from PGO is only visible when the real workload mimics the one that we’ve been training for. If the program input diverges and the execution gets into the non-optimized paths, it can actually be even slower than without any profiles:

$ ./shape 15000 43243223423 30 triangle circle
last round 49 ms.

Should something like that happen, it is time to re-profile your application, gather new PGO data and recompile. Note that prior to 19.2.0 one needed to create a special instrumented native image of the program to collect the profile information, but doing it by running application without preparing an instrumented native image is much simpler.

Conclusions

graalvm

GraalVM team blog - https://www.graalvm.org

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store