Isolates and Compressed References: More Flexible and Efficient Memory Management via GraalVM

tl;dr: GraalVM native images now support isolates (multiple independent VM instances in the same process) and compressed references (use 32-bit references to Java objects on 64-bit architectures). This reduces memory footprint and makes it possible to strictly separate memory for, e.g. web requests from different users.

Introduction

Many people wonder why both Java and VirtualBox are called “virtual machines”, since they do such different things. One common theme is that they both try to achieve “write-once, run-anywhere” for your programs, and they both try to hide the underlying machine from the program, but in very different ways. However, typically the main reason that people use VMs like VirtualBox (let’s call them OS-level VMs) is that they want to more efficiently use physical hardware resources by packing multiple applications on the same server while isolating them from interference from each other. Newer versions of OS-level VMs like Docker make virtualization more efficient by sharing the OS between multiple applications.

GraalVM provides yet another kind of virtualization we call “language-level” virtualization, that allows multiple languages to run in the same process (or thread). It provides another level of “write-once, run-anywhere” by allowing a library written in one language to be called directly from another language without performance penalties. The news today is that GraalVM now also provides a way to use hardware resources even more efficiently by allowing multiple applications to share the language runtime (eg. the JVM). This can be important in cloud environments or other environments where packing more tenants per server can directly reduce infrastructure costs. Over time, we expect GraalVM to continue to blur the line between Java-style and OS-level virtualization.

Towards that end, we’re now introducing a new virtualization feature in GraalVM called isolates. A GraalVM isolate is a disjoint heap that allows multiple tasks in the same VM instance to run independently. In a traditional Java application server, all of the tasks share the same memory heap, and if one task uses a lot of memory it can trigger garbage collection (GC), slowing down the other tasks sharing that heap. Isolates are provably disjoint, so each isolate can be garbage-collected independently (or just destroyed before any GC is needed). Isolates are a great tool for managing application multitenancy, or just to break down a single monolithic application into manageable microservices.

Isolates also enable another optimization, which is compressed pointers. Since the isolate heap is typically intended to be used just by one task (and not a huge set of tasks that a Java application server might have to support), it’s likely that an isolate is small and will not need 64-bit pointers to address all the memory it contains. Most data structures in object-oriented languages use more space for pointers than they do for primitive data, so making them smaller can have a large impact on the application footprint.

While that all sounds great, there are some important limitations with isolates to be aware of. First, they are a low-level feature designed to be used by a higher level technology that manages tasks (like a database or a serverless cloud). In the database, for example, we will use an isolate for the Graal compiler to put its own objects and another isolate for the application data. Second, isolates don’t yet include features like snapshots that one would expect from a full-fledged virtualization technology. Finally, isolates and compressed pointers are only available when using Substrate VM — they don’t apply when running GraalVM embedded in the Java HotSpot VM (which has its own ideas about heap management). Isolates are available in any edition of GraalVM, but compressed pointers are only available in the Enterprise Edition.

Isolates are a powerful tool for building services in a way that reduces memory footprint and also has positive effects on other metrics such as maximum latency and throughput. It can be used in combination with the GraalVM native-image tool where the services are written in Java or Scala and compiled ahead-of-time (which also reduces memory footprint as well as reducing startup time). A previous article presents more details about this approach. Alternatively, isolates can be used to manage dynamic languages built on the Truffle APIs (e.g. JavaScript, Python, etc.), which is done when we embed GraalVM in the database.

We expect that the most common usage of isolates is for building a multi-tenant server, where a separate isolate is used for each request. After the request processing is done, the isolate can simply be discarded without performing any garbage collection. This approach is best for applications that process events in parallel like a web server. This article goes through the details on how to build that kind of application using isolates.

Isolates

Isolates provide multiple independent VM instances in the same process. Creating an isolate creates a new heap, with the image heap as the starting point. This means that all the initialization that is done during image generation is immediately available in every isolate. All isolates share the same ahead-of-time compiled code, i.e., there is no separate static analysis and separate compilation per isolate. Since code is immutable, the sharing of this code is desirable.

Because each isolate has a separate heap, it is not possible to have direct Java object references between two isolates. This is a restriction and a benefit at the same time: The application developer needs to ensure that the object graph is completely partitioned, and it is, for example, not possible to have a global cache that is accessible from all isolates. But the isolation allows garbage collection to happen independently in each isolate, i.e., without stopping or influencing other isolates. And all memory allocated by an isolate is freed automatically when the isolate is torn down, without the need for a garbage collection.

The memory isolation also gives security guarantees: objects from one isolate (maybe associated with a particular user) cannot be erroneously accessed by another isolate (associated with a different user). Bugs that allow information to leak between users via, e.g., static fields or caches maintained by libraries, are prevented by the isolation.

The following figure shows two isolates in a process. Each isolate has its own copy of the image heap (efficiently managed using a copy-on-write mapping to reduce the memory footprint) and its own run-time heap where newly allocated objects are placed.

We provide two kinds of APIs to create and manage isolates: a Java API and a C API. The Java API is intended for Java-only applications (or applications written in Scala or Kotlin) that want to use multiple isolates. The C API is intended to integrate Java (or Scala, Kotlin) code into a pre-existing C application and manage isolates from C code. We first have a look at the Java API, and later on give a brief overview of the C API. Again, note that both the Java and the C API work only for native images, not on Java VMs such as the Java HotSpot VM.

When building an executable application with a Java main() method, a default isolate is created automatically before the main() method is invoked. When building a shared library that is integrated with C code, no isolate is created automatically, i.e., the initial isolate has to be created using the C API.

Java API for Isolates

The API introduces two opaque pointer types: Isolate and IsolateThread. Even though they look like Java interfaces, they are actually treated as machine-word-sized values, i.e., they are not Java objects. The details are out of scope for this article, just think of these values as void* pointers in C.

Isolate is the main descriptor for an isolate. If you have this value, you have full access to the isolate. Every thread that is attached to the isolate is represented by an IsolateThread. This is the type that you will use most frequently: when you want to invoke a method of an isolate, you need to provide an IsolateThread.

The class Isolates contains the API to manage the isolates. Here are some important methods for controlling the life cycle of an isolate:

IsolateThread createIsolate(CreateIsolateParameters params);
void tearDownIsolate(IsolateThread thread);

The method createIsolate() creates and initializes a new independent VM instance. The Java heap of the new isolate consists only of the image heap, i.e., objects of the calling isolate are not available in the new isolate. The current thread is attached to the new isolate, and the IsolateThread descriptor is returned to the calling isolate. We can now invoke methods in the new isolate, more on this later. The method tearDownIsolate() discards an isolate. Among other things, it frees all memory associated with the isolate by returning it to the operating system - there is no garbage collection necessary for that.

There are also additional functions to attach additional threads, to detach threads, to check if a thread is already attached, and to convert between Isolate and IsolateThread. Look at the API specification for details.

It is now time to introduce our running example for this article. We are using the Netty web server to respond to web requests to plot functions. The function is provided by the user in the http request. The http response is a Scalable Vector Graphics (SVG) object. We use exp4j for expression evaluation and SVGGraphics2D for SVG file rendering. Both expression evaluation and rendering allocate temporary Java data structures, and we know all of these objects are unreachable after the request has been processed. But a traditional Java VM still needs an expensive garbage collection to eventually discard these temporary objects. The http requests are also coming from different users. Without a complete code review of all libraries that we depend on, we do not know if there are any global data structures or caches that could allow one user to observe properties of other users' expressions. Isolates solve both problems. In the request handler, we create a new isolate for each request and tear down the isolate after expression evaluation:

Now we need to add the invocation of the actual rendering function. This is a little bit more complicated than a normal Java method invocation: We need to leave the Netty isolate (the default isolate that was created automatically at application startup) and enter the rendering isolate (the new isolate that we explicitly created). And since the heaps of the two isolates are completely separated, we cannot pass Java objects directly. The argument function string is an object in the Netty isolate, so the rendering isolate cannot access it. We first need to copy the string into the rendering isolate. The same holds for the return value: we cannot return a ByteBuffer Java object directly. We can only pass handles to Java objects. A "handle" is an opaque indirection to a Java object. The object that the handle refers to can only be accessed in the isolate in which the object and handle were created.

In summary, our rendering method is defined as

The annotation @CEntryPoint marks the method as an isolate-transition method. You could also invoke this method directly from C code (that is where the name of the annotation comes from). The parameter renderingContext is the isolate that is entered when the function is called. This is denoted by the parameter annotation @CEntryPoint.IsolateThreadContext. We also pass in the the nettyContext as a parameter, since later on we will need to call back to the Netty isolate. But there is no special meaning attached to that parameter.

The functionHandleparameter and the return value have the type ObjectHandle. For each handle, we have to know which isolate they are valid in. We define the functionHandle to be a handle for the rendering isolate, and the return value to be a handle for the Netty isolate. This is a conscious decision that we made.

For the complete implementation of the example, please have a look at the commented source code in the GitHub repository.

Entering an isolate executes code that sets up internal registers and performs an internal thread state transition into the “active” mode. Leaving an isolate executes code that catches and reports all exceptions and performs an internal thread state transition into “inactive” mode. At the call site of plotAsSVG(), the following happens:

  • Before the call, the Netty isolate is active and the rendering isolate is inactive.
  • Leave the Netty isolate. Now both isolates are inactive.
  • Enter the rendering isolate.
  • plotAsSVG() is executed in the rendering isolate.
  • Leave the rendering isolate. Now both isolates are inactive.
  • Enter the Netty isolate.
  • The code after the call is executed in the Netty isolate.

As you can see, in any given thread at most one isolate is active at any given point in time. The following figure visualizes the stacks. Assume that there are two rendering isolates active in two different request handler threads:

All threads start out in C code for the thread start routines (pthreads functions on Linux). The Netty isolate starts all the request handler threads, so frames that belong to the Netty isolate are on all request handler threads. The last frame is for the method plotAsSVGInIsolate(). Each transition (entering or leaving an isolate) results in a transition frame. The first frame after the transition frame in the rendering isolate is for the method plotAsSVG().

You can switch between isolates multiple times on a stack. In our example, the rendering isolate calls back to the Netty isolate (to allocate the result ByteBuffer instance; this code is omitted from this article). The following listing shows the GDB stack trace when stopped in the method createByteBuffer() that allocates the ByteBuffer instance for the plot result:

In the stack trace, you see multiple stack frames that enter an isolate (#1, #4, and #38) and that leave an isolate (#5 and #2). They are shown with synthetic function names.

Impact of Isolates on Memory Footprint

How do isolates impact the memory footprint of our application? To evaluate that, we send the request to plot a function repeatedly, and print the resident memory set size after each request. We use the Linux command pmap -x processid to query the memory size. The following figure shows the results with and without isolates for 50 requests:

The first data point is the memory footprint after the startup of Netty, before serving any request. Every request allocates about 1.8 MByte of Java objects. Without isolates, these objects are not freed immediately, but fill the young generation of the heap. When the young generation is full, the garbage collector runs and frees the objects. For this benchmark, we fixed the young generation size to 80 MByte. This limit is reached at request #39. After the garbage collection during this request, the young generation starts out empty again and is filled up. As an optimization, the memory is not returned to the operating system immediately, therefore the resident memory set size remains high but does not grow linearly as before. But approximately every 40 requests, a garbage collection is necessary.

With isolates, the temporary objects allocated during rendering are freed immediately when the isolate is deleted, without a garbage collection. Therefore, the resident memory set size remains low without any garbage collection overhead, and many more requests can be handled without a garbage collection.

Pre-Initialization of Objects using the Image Heap

Each isolate has its own copy of the image heap. Remember that the image heap is prepared at build time, i.e., during image generation. We can use this functionality to avoid the execution of initialization code at run time. In our example, the function rendering code uses an instance of the class SVGGraphics2D for rendering. In a traditional Java execution, this object cannot be a singleton: multiple requests can be handled at the same time in different threads, i.e., multiple instances are in use at the same time and all instances are in the same heap, and it is necessary to allocate a new instance at run time:

In our isolate-based model, each rendering is performed in a separate isolate. This means that only a single instance of SVGGraphics2D exists per isolate. We can therefore have a singleton instance, and we can allocate and initialize the singleton during image generation. The isolate starts up with the instance already being present on the image heap, and no allocation and initialization at run time is necessary. Note that in this case SVGGraphics2D is not a large data structure so the savings are modest, but there are many use cases where you can prepare much larger data structures during image generation.

A simple way to achieve the pre-allocation during image generation is to perform the allocation in the class initializer and use a static final field:

But in order to make the pre-initialization more explicit, we also offer a mechanism called “image singletons” where you can register singleton objects during image generation, and access them at run time. The initialization happens in a so-called “feature” that runs during image generation:

The fully qualified class name of the feature implementation class must be provided to the native-image tool using the option --features=com.oracle.svm.nettyplot.PlotterSingletonFeature.

Running the Example

The example is based on the Netty example that we introduced in an earlier article. The complete source code of this example is available in the GraalVM-demos repository on GitHub in the folder native-netty-plot. To run it, you need GraalVM 1.0 RC9 or a later version. GraalVM comes in two variants: the open-source Community Edition and the commercial Enterprise Edition (which you can download for free for evaluation purposes). Isolates are available in both editions, but the compressed references introduced later in this article are only available in the Enterprise Edition. Therefore, we recommend using the Enterprise Edition to run this example.

Since the earlier article on Netty, we were able to improve the developer experience and greatly simplify the build instructions. We assume that you installed GraalVM 1.0 RC9 in your home directory and cloned the example repository. In the native-netty-plot directory of the example (it contains the pom.xml file), you can now build the example using

$ mvn package

This builds a single .jar file that contains the example and all its dependencies in target/netty-plot-0.1-jar-with-dependencies.jar. Then you can build the native image for this application:

$ ~/graalvm-ee-1.0.0-rc9/bin/native-image -jar target/netty-plot-0.1-jar-with-dependencies.jar

Several options for the native-image tool are automatically taken from a native-image.properties file that is packaged in the .jar file in a subdirectory of META-INF/native-image/.

Now you can start the web server:

$ ./netty-plot

Finally, we can open our browser and request the rendering of a function by browsing to http://127.0.0.1:8080/?function=abs((x-31.4)sin(x-pi/2))&xmin=0&xmax=31.4:

C API for Isolates

The same functionality of the Java class Isolates is also available as a C API. This allows you to embed Java code in existing C applications. In this scenario, the C code manages the life cycle of isolates: The C code creates an isolate, invokes a Java method in the isolate (a Java method annotated with @CEntryPoint), and in the end tears down the isolate. The C API is exposed automatically when you build a shared library, i.e., when you execute native-image with the option --shared. In addition to the shared library for your Java code, the native image tool generates a C header file with type definitions and function prototypes: the type graal_isolate_t for isolate descriptors and the type graal_isolatethread_t for thread descriptors; and functions such as graal_create_isolate to create an isolate and graal_tear_down_isolate to destroy an isolate.

The GraalVM repository on GitHub contains an extensive example on how to use this API and the C interface for native images in general. You can access the C code for the example here, and the Java code for the example here.

Isolate Implementation Details

We implemented isolates with two performance goals in mind:

  1. Creating new isolates must be fast and have a low memory overhead, so that many isolates can be created for short-running tasks.
  2. Low or no impact on peak performance, i.e., code running inside an isolate.

To achieve these goals, we changed how references to Java objects are handled: instead of using the absolute memory address of an object, we now use references that are relative to the start of the image heap. This means that memory accesses, like loading a field from an object or an array element, need an indirection: the heap start needs to be added to the reference before the memory access. To make this as fast as possible, the heap start is always available in a fixed register (we use the register r14 on x64 architectures). Note that in many cases, the addition can be folded into the memory access instruction on the x86 architecture, avoiding an explicit arithmetic operation.

With all object references being relative to the start of the image heap, the image heap that is prepared during image generation and part of the native executable can be memory-mapped multiple times in the address space. This allows replicating the image heap without copying (fast isolate creation) and copy-on-write sharing of the image heap (low memory overhead).

The following steps are performed to create a new isolate:

  1. Reserve a contiguous memory range from the operating system that is large enough to hold the image heap.
  2. Locate the image heap in our executable file on disk by examining the memory mappings created by the operating system’s loader. The result of this step can be cached so that this step can be skipped when creating further isolates.
  3. Create a read-only mapping of the image heap from disk to the beginning of the memory range that we reserved earlier.
  4. Mark the partition of the mapping that contains the writable objects as copy-on-write.
  5. Set the designated heap-base register so that it contains the start address of the reserved memory range, which now has the mapping of the image heap at its beginning.
  6. Attach the current thread to the isolate, creating a thread-specific execution context and adding the thread to the per-isolate list of attached threads.

Note that references between Java objects in the image heap do not need any relocation when the isolate is create, because these references, like any other references between Java objects, are relative to the heap start.

An isolate can start new threads, or existing threads can be attached to an isolate. There is a n:m relationship between threads and isolates: one thread can be attached to multiple isolates, and one isolate can have multiple threads attached.

In order to tear down the isolate later, all the isolate’s remaining threads are interrupted, which triggers an exception in each thread that can be handled and passed on for a clean shutdown. Once all threads have ended, the isolate is disposed by returning its entire memory range to the operating system.

Compressed References

Memory footprint is important: it directly affects the price of running your software in the cloud. The copy-on-write sharing of the image heap for isolates can already give you some memory footprint improvements. But especially in reference-heavy managed languages like Java, the full 64 bit for each reference adds up to a significant overhead. As long as you run with a modest heap size, 32 bit are sufficient for references because they allow a maximum addressable heap size of 32 GByte (2³⁵ Byte). The additional 3 bit of addressing range over a 32-bit reference are due to the usual 8-byte object alignment, i.e., it is not necessary to store the lowest order 3 bit of a reference.

Many reference compression and uncompression operations can be folded into the x86 memory addressing modes. The Java HotSpot VM supports compressed references since a long time, so the techniques are well understood. But the Java HotSpot VM does not support an initial image heap, and can therefore use a constant as the base of the compressed references. Only the introduction of isolates allows us to support compressed references in native images: Because all memory accesses are now already relative to the beginning of the image heap (that we store in the register r14), it is straightforward to use only 32-bit references relative to the beginning of the heap instead of the full 64-bit references.

The implementation of compressed references only requires a small modification to the step 1 listed above for creating a new isolate: Instead of reserving a memory range from the operating system that is large enough to hold just the image heap, we now reserve an address range that covers the maximum heap size. By default, we reserve a full 32 GByte of address space. This memory is neither required nor expected to be backed by physical memory. Only the parts of the address space that are actually occupied by allocated objects is committed memory.

Memory for Java objects must be allocated in the reserved contiguous memory range. This requires us to manage that range on our own, and request from the operating system that subranges be backed by physical memory, as well as returning unused subranges backed by physical memory to the operating system.

How much memory do we save with compressed references in our example? The request handler prints the size of the Java heap for the rendering isolate after isolate creation and before tear down. Here is the output with compressed references:

Rendering isolate initial memory usage: 4114 KByte
Rendering isolate final memory usage: 5910 KByte

The initial memory size is the size of the image heap. But remember that this memory is copy-on-write, so only a part of it needs to be committed by the OS. The difference of 1795 KByte between the initial and the final memory usage is the amount allocated during rendering.

Here are the numbers without compressed references:

Rendering isolate initial memory usage: 4895 KByte
Rendering isolate final memory usage: 6936 KByte

The size of the image heap is larger, and the amount of memory allocated during rendering increases to 2041 KByte.

Caveats

Isolates and compressed references are a new feature in Substrate VM. We hope you give them a try and tell us about your experience. Compressed references are enabled automatically, but you can disable them using -H:-UseCompressedReferences.

Isolates are supported both on Linux and MacOS, but our MacOS implementation is less optimized. Creating a new isolate always copies the image heap on MacOS as opposed to mapping from the image file as is done on Linux and therefore takes a bit longer. We are working on full support for MacOS too, stay tuned!

Summary

In this article, we looked at two advanced features on the GraalVM native images: more flexible memory management with isolates and reducing memory footprint of native images with compressed references. Both are available in the recent release of GraalVM, so if you want to experiment with these, grab the binaries from the website, and give it a go. Note that these are advanced features, so if you have not tried building native images before, their true benefits might be less obvious. But if you want to isolate parts of the memory of your application or implement a computation that will produce a lot of objects which you can release as a whole, or you want to split your application memory into individual chunks based on some external limits, isolates can be there to save the day. Compressed references give you smaller memory footprint at no cost, which is particularly exciting for platforms where memory footprint is crucial.