GraalVM 21.3 is here: Java 17, Native Image performance updates and more 🚀

Published in

graalvm

13 min readOct 19, 2021

Today we are releasing GraalVM 21.3! It brings a lot of great features that have been long anticipated by the community, and in this blog post we’ll talk about some of the most interesting and promising of them.

Get Updated

21.3 is the last release of the year, which means that for GraalVM Community Edition it will keep receiving updates for the next 12 months. So if you were considering upgrading your GraalVM version, now is a great time to do so!

Download GraalVM 21.3 Community

Download 21.3 Enterprise

You can also see what’s new in our release stream recording:

https://www.youtube.com/watch?v=Tsc2Io9DJsE

Along with the JDK11-based builds, GraalVM 21.3 is also available for recently released Java 17. This means that in addition to all the great Java 17 features, such as pattern matching for switch expressions, sealed classes, platform updates and many more, we are also getting access to all Java 11+ features accumulated in Java 17. So go ahead and give it a try! To learn more about making the most out of the Java 17 features with GraalVM and Micronaut, watch this Developer Live session by Graeme Rocher.

As of this release, we also have new Enterprise container images and a new set of Community container images! To get the Enterprise images from the Oracle Container Registry, use the following (assuming you are already logged into OCR using docker login and accepted the license):

docker pull container-registry.oracle.com/graalvm/enterprise-8:jre-headless

If you want Community container images, you can get them from GitHub:

docker pull ghcr.io/graalvm/graalvm-ce:latest

You can also install the latest version of GraalVM via the “GraalVM Tools for Java” extension for VS Code.

Compiler Updates

As with every release, the Graal compiler updates bring new performance optimizations and promote previously experimental features. For example, the Strip Mining optimization for non-counted loops is now enabled by default. With this optimization, more uncounted loops are converted to counted loops, becoming subject to further optimization such as vectorization and partial unrolling. This results in up to 15% speedup for workloads making heavy use of non-counted loops.

Enhanced Automated Vectorization is now also enabled by default, which on certain workloads (such as math-heavy ML workloads) makes GraalVM Enterprise up to 40% faster than OpenJDK.

One of the new optimizations added in 21.3 is the Infeasible Path Correlation optimization, intended to eliminate infeasible paths.

Look at the following example:

class A { ... }
class B extends A { ... }
if (x > 5) {
    if (y instanceof A) {
        throw Error();
    }
}
loop {
    int foo = <expensive computation>;
    if (x > 10) {
        if (y instanceof B) {
            bar(foo);
        }
    }
}

The bar(foo) call will only be executed if x > 10 && y instanceof B. However, that condition implies x > 5 && y instanceof A , which means throw Error() will be executed. That is, the call to bar can never actually be executed. Graal now recognizes this and eliminates both the call to bar as well as the computation of foo if it’s only used as an argument to bar.
This optimization is enabled in GraalVM Enterprise by default and can be disabled with -Dgraal.InfeasiblePathCorrelation=false.

Additionally, GraalVM Enterprise adds support for Constant Blinding to defend against JIT spraying attacks. The constant blinding phase encrypts user-supplied constants in the code with a randomly generated key, so that the attacker cannot rely on the immediate value being present in executable memory. Constant blinding is an experimental feature disabled by default; enable it with -Dgraal.BlindConstants=true.

This release of GraalVM also introduces a new tool for improving capturing and analyzing profiles of Java programs: proftool. It’s intended to provide machine-level details about the execution to aid with JIT performance analysis. At the moment it contains a JVMTI agent for capturing all the assembly generated by the JVM, a parser for Linux perf output, and a parser for HotSpot LogCompilation information. By combining these components into a single command line, the perf profile information can be attributed to the JIT code. Profile collection currently only supports Linux perf, though once the data is captured, then the profile can be viewed anywhere. We’ll talk more about this tool in a follow-up blog post.

Native Image

21.3 introduces conditional reflection configuration for Native Image. Configuration file entries that are necessary at image build time for Reflection, JNI, class path resources, and Dynamic Proxy objects can now be conditional based on the reachability of a class. With conditional configuration, a class configuration entry is applied only if a provided condition is satisfied. The only currently supported condition is typeReachable, which enables the configuration entry if the specified type is reachable through other code. For example, to support reflective access to sun.misc.Unsafe.theUnsafe only when io.netty.util.internal.PlatformDependent0 is reachable, the configuration should look like:

{
  "condition" : { 
    "typeReachable" : "io.netty.util.internal.PlatformDependent0" 
  },
  "name" : "sun.misc.Unsafe",
  "fields" : [
    { "name" : "theUnsafe" }
  ]
}

This enables more precise configuration, which can reduce the size of a native executable. We’ll talk about conditional reflection in more detail in a follow-up blog post.

Another update in Native Image has been especially anticipated by the community: as our recent poll shows, the #1 thing developers would like to see improved in Native Image is build times. 21.3 accumulates our consistent work on improving build times and memory requirements over the past few releases — in fact, even in 21.2 we saw improvements reports from the community, mentioning 20% faster build time and ~9% less memory usage. With 21.3 those improvements have gone even further. We’ve measured the changes to image size and image build time for Spring petclinic-jdbc:

Native Image build time and image size improvements

If you are interested in following Spring Native’s progress, take a look at the recent talk from Sébastien Deleuze at Spring One, which covers their latest features, performance numbers and GA plans.

Also, in Native Image small methods are now inlined before the static analysis to improve the precision of the static analysis. This improves constant folding and reduces surprises. For example, previously code that accessed a static final field directly was optimized differently from code that accessed a static final field via an accessor method:

void foo() {
   if (MyConfiguration.WINDOWS) {
    // The static analysis does not see this code as reachable on non-Windows platforms.}}
  }
}
static boolean isWindows() {
  return MyConfiguration.WINDOWS;
}
void bar() {
  if (isWindows()) {
     // The static analysis marked this code as reachable because the method invocation prevented constant folding before the static analysis. With method inlining before static analysis enabled now, this code is no longer seen as reachable on non-Windows platforms.
  }
}

Also, 21.3 brings changes to how the reflection metadata is stored. Native Image now distinguishes between queried and invoked reflection methods. This distinction can further reduce image size. For example, webmvc-tomcat gets 9% image size decrease. To mark methods that should be used only for querying metadata the reflection-configuration entries should be prepended with query. For example:

{
  "name": "org.graalvm.Example",
  "queryAllDeclaredConstrctors": true,
  "queriedMethods": [{"name":"queriedOnlyMethod", "parameterTypes":[]}]
}

The Native Image agent will also make the distinction between queried methods and invoked methods, and output the adequate configuration.

Another great update in 21.3 is initial support for the Java platform module system. The Native Image generator now accepts module-related options known from the java launcher, like -m (or — module) , -p (or — module-path)--add-exports and --add-opens. When a module-related argument like -m or -p is used, the image generator itself runs as a set of Java modules. This is a major change to how the image generator works, and there is still ongoing development to support more aspects of the module system. Please provide feedback and bug reports when you try out this new feature.

For example, you can build an image from two Java modules: base-module.jar and app-module.jar, where module sample.app contains the main class:

native-image --module-path base-module.jar:app-module.jar --module sample.app

21.3 brings significant improvements to peak performance of GraalVM Enterprise Native Image. With profile-guided optimizations and G1, native executables achieve peak performance on par with OpenJDK, in addition to great startup performance and resources usage:

Performance of GraalVM Enterprise Native Image with PGO and G1 is on par with OpenJDK

Another great update related to Native Image performance is a new policy for the Serial GC to reduce application runtime memory footprint. The new policy enables survivor spaces for the young generation, a feature that has been present in the codebase for a while but was not enabled. In addition, a full GC no longer scans the whole image heap for root pointers, but only parts of the image heap that have been written to. The new policy is not enabled by default (this is planned for the next release), but can be enabled using -H:InitialCollectionPolicy=Adaptive. Please test the new policy and report any regressions so that we can address them before the next release.

Native Image in this release also supports Java 17. Along with other features it also includes JDK Flight Recorder (JFR) for JDK 17, which was a contribution by Red Hat.

Polyglot Runtime and Embedding

In this release, we significantly improved the usability and precision of the built-in CPU sampler tool. The CPU Sampler works with all polyglot languages and comes as a builtin command line tool for all launchers. In particular, the following aspects were improved:

Increased precision of the sampling output by using the new guest language safepoint mechanism. The sampling output now shows inlined methods in addition to compilation units by default.
Simplified the default sampling histogram output and added the — cpusampler.ShowTiers option to show time spend in each optimization tier.
Added the SVG flamegraph output format to the CPUSampler. To enable it, use the new — cpusampler.Output=flamegraph option.

This example flamegraph was produced by running the following command:

js --cpusampler --cpusampler.Output=flamegraph --cpusampler.OutputFile=out.svg primes.js

GraalVM Enterprise includes improvements in the stability and performance of the code snapshotting feature, called auxiliary engine caching. Engine Caching is intended to eliminate the warmup of Truffle guest language programs, which comes from operations like loading, parsing, profiling and compilation. Within a single OS process, the work performed during warmup can be shared with in-process engine caching. Auxiliary engine caching builds upon this mechanism but adds the capability to persist a code snapshot with ASTs and optimized machine code to disk. This way, even for the first execution of the guest application the warmup can be significantly reduced. The application does not need to be modified for this feature to work. You will soon learn more about this feature in a blog post.

For polyglot embedders, this update offers the new capability to share values between contexts. Previously value sharing was only supported for primitives and host values but failed with an error for any guest value. This restriction was lifted in this release. Value sharing can also be disabled entirely with public Context.Builder allowValueSharing(boolean enabled). Turning sharing off can be helpful when strict safety is required, and it would be considered an error if a value of one context is passed to another. This feature resolves the long standing issue #631 and was already warmly welcomed by the community.

Another feature included in 21.3 for workloads requiring strict safety is value scoping for guest-to-host callbacks. Any host method called from a guest application configured with value scoping will automatically free any passed value when the method returns. This allows embedders to ensure that values do not exceed the lifetime of a host method call. Any access to a scoped value after the method has returned will result in an error. Value scoping can be enabled using the HostAccess.SCOPED preset when building a context.

For language implementers, this release adds a new API to implement static objects. The new Static Object Model API allows to represent the layout of objects that, once defined, do not change the number and the type of their properties. It is particularly well suited for, but not limited to, implementing the object model of static programming languages. In addition, this release brings significantly faster language context lookups on all platforms.

JavaScript

In 21.3, Node.js version was updated to 14.17.6 to include new features and security fixes. As usually we implemented several new ECMAScript proposals, such as Error Cause, Import Assertions, and Accessible Object.hasOwnProperty. To try them, use corresponding flags (e.g., --js.error-cause or--js.import-assertions). We also improved GraalVM’s WebAssembly support with several completeness and performance fixes, including the JavaScript BigInt to WebAssembly i64 integration.

Python

A major Python improvement in 21.3 is added binary support for HPy native extensions, which makes it possible to run the same HPy binary packages as in CPython without having to recompile them.

Another notable improvement is support for the multiprocessing module, which supports the orchestration of tasks across multiple Python contexts on a multi-core machine. For example, to calculate pair-wise Jaccard similarity indices across a corpus of literature you can now scale up to take advantage of all your cores. Using a multiprocessing pool with 4 workers can triple the performance of this workload. This feature was highly requested by the community and we are looking forward to getting feedback.

Another improvement in Python is a rewritten socket module using native code, allowing a much wider range of network use cases to work than before. This includes the message oriented socket calls (recvmsg/sendmsg), using unix domain sockets, as well as proper interactions of sockets opened from Python with native extensions. We also added support for the ctypes module, enabling more native extensions that use the ctypes API to run. As an example, the Wand package to interface with ImageMagick from Python now works with GraalVM 21.3.

Ruby

To speed up workloads that make heavy use of regular expressions, TruffleRuby now uses TRegex. TRegex is a generic regular expression engine that uses the GraalVM compiler and Truffle API to execute regular expressions in an efficient way. TRegex provides an implementation of ECMAScript regular expressions and a subset of Python regular expressions — and now for Ruby as well. A distinguishing feature of TRegex is that it compiles regular expressions into finite-state automata. This means that the performance of searching for a match is predictable (linear to the size of the input).

By using TRegex by default, TruffleRuby provides large speedups for matching regular expressions. You can see how TruffleRuby with TRegex is up to 9x faster on larger Regexp benchmarks:

TruffleRuby Regexp performance with TRegex

Performance gains on the example of the Shopify Storefront Renderer app (Kevin Menard, VMM 2021 session)

To learn more about how TRegex works under the hood, you can watch this RubyKaigi session from Benoit Daloze and Josef Haider.

Among other improvements, TruffleRuby added fully integrated support for foreign objects traits (arrays, hashes, iterables, etc) in polyglot mode, which now behave like their Ruby counterpart, and support for transparent inner contexts as used in the ExecJS gem.

FastR

In GraalVM’s R implementation, FastR, we continue to focus on compatibility. In particular, there was an upgrade from PCRE to PCRE2 version 10.37. To learn more about what this change may mean to you, check the GNU-R changelog (section “Migration to PCRE2”). Additionally, we improved configuration for compilation of some packages (e.g., Maps) along with some more usability improvements and bug fixes.

Java on Truffle

Java on Truffle in this release offers both guest and host support for Java 17. We have also significantly improved startup performance. As an illustration, we compare the time taken by the first iteration of some popular benchmarks between 21.0.0 and 21.3.0 (we are measuring first iteration time, so lower is better):

To learn about other changes, take a look at the release notes or the project changelog.

WebAssembly

Over the past few releases we’ve been working on improving performance of GraalVM’s WebAssembly runtime, resulting in notable changes:

Additionally, the interaction between JavaScript and WebAssembly is now based on the WASM Embedding Interface. To learn about other changes, take a look at the release notes or the project changelog.

LLVM Runtime

LLVM toolchain for compiling C/C++ was updated to version 12.0.1. We also improved modularization of the codebase around “managed mode” and “native mode” code. This reduces static build dependencies between these two modes, at the same time making the managed mode codebase more robust by removing all unmanaged memory accesses.

Tools

With this release, GraalVM Tools for Micronaut Extension added support for Kubernetes: now you can deploy, run and debug Micronaut applications in a Kubernetes cluster directly from Visual Studio Code. The following quick actions for Micronaut are enabled:

When you invoke the “Micronaut: Deploy to Kubernetes” action, your Java project is packaged, Docker image is built and pushed to the container registry. Deployment is created in the Kubernetes Cluster and local port is forwarded to an application running in a pod.

After deployment you can debug this project from within VSCode GraalVM Extension Pack. To do so, connect Kubernetes extension to your cluster using “Set Kubeconfig” and select the node you are developing and invoke the action Debug (“Attach using Java 8+”). This will perform kubectl port forward and debugging remote K8s pod using kubectl port forwarding capabilities.

Additionally, Native Image debugging in VS Code, enabled with GraalVM Tools for Java extension, was significantly improved. Now you can attach the debugger to Native Image processes and step over the image “real” code. Attaching debugger to a native executable is done via adding a configuration to launch.json. Select: “Native Image: Attach to Process” from the configuration completion in launch.json file, which will generate “Attach to Native Image” configuration. When this configuration is selected and executed, you’ll see a list of running processes. Select the one that corresponds to your native executable:

Attaching debugger to a native executable

GraalVM extensions have also been updated to work with Java 17.

Conclusion

With all updates that went into this release, now is a great time to update your GraalVM version — you can get new builds for both GraalVM Enterprise and Community.

We are grateful to the community for all the feedback, suggestions and contributions that went into this release. If you have additional feedback on this release, or suggestions for features that you would like to see in future features, please share it with us on Slack, GitHub, or Twitter.

— the GraalVM team