The many ways of polyglot programming with GraalVM

Published in

graalvm

15 min readNov 5, 2020

Introduction

My name is Michael Simons and I work as software engineer for Neo4j. Neo4j is the creator of the Graph database with the same name. The main focus of my work is our object mapping framework (Neo4j-OGM) and Spring Data Neo4j. You can find my personal blog under info.michael-simons.eu and myself on Twitter as @rotnroll666.

Thanks to Oleg for the invitation to write on the GraalVM blog, Neo4j and Michael Hunger for supporting this work.

For me the best resources for learning about GraalVM and the ecosystem is actually the reference manual and the community support on Slack. The manual is outstanding and pretty exhaustive.

First contact

The first contact I personally had with the GraalVM date back to 2017. I met Jaroslav at JCrete 2017 and he was super enthusiastic about Graal. I would be lying if I would say I did understand everything that Jaroslav presented back then, but it looked already cool.

Fast forward a couple of years to 2019: Suddenly native compilation of Java programs becomes a big thing. New frameworks like Quarkus and Micronaut are created to take advantage of GraalVM native-imageand the SubstrateVM. native-image is the part of GraalVM that is responsible for building native executables out of Java programs. It uses the Graal compiler in ahead-of-time (AOT) mode for compilation. SubstrateVM is a stripped down runtime. Those programs start blazing fast and often have a reduced memory footprint.

I was at the right time in the right place and met with Sanne who explained the concepts of Quarkus to me and asked me about Neo4j, our database driver and whether I would see the possibility to make the driver compatible with GraalVM native image and even contribute to Quarkus.

Armed with the information from Sanne I was able to make that happen. The Neo4j Java Driver is now compatible with GraalVM native image, including SSL support. I also could provide the support for native Spring Data Neo4j.

I tried to summarize all the work with native-image in this post: "about the tooling available to create native GraalVM images.

We will hear about native-image later in this post.

Examples

All examples used in the post are available in full on my GitHub profile:

The have been tested to not leak any resources like dangling transactions or connections but are of course only proof of concepts.

Truffle Language Implementation Framework

The Truffle framework — or just Truffle — is the underlying framework providing the polyglot programming experience on GraalVM. If you want to just do polyglot programming on GraalVM, you won’t hardly notice it. You can run embedded languages on the GraalVM or use the Graal launchers for a specific supported language with the polyglot option set for that. You will probably use the Polyglot API living under org.graalvm.polyglot or the corresponding namespace in your language extension and that’s about it.

If you want to just dig a little deeper, it is enough to understand that Truffle is a pure Java library that allows language interpreters to use the GraalVM compiler as a just-in-time compiler for the target language. This is done through annotated methods and of course the Truffle Java API. By having access to Truffle, a Ruby application, for example, can run on the same JVM as a Java application. Also, a host JVM-based language and a guest language can directly interoperate with each other and pass data back and forth in the same memory space. If you want to implement your own languages on top of GraalVM, you should consider reading more about Truffle here.

In order to provide foreign polyglot values in the languages implemented within Truffle, the so-called polyglot interoperability protocol has been developed. This interoperability protocol consists of a set of standardized messages that every language implements and uses for foreign polyglot values. The protocol allows GraalVM to support interoperability between any combination of languages without requiring them to know of each other.

I will use the term “host language” for the language from which a polyglot context is initialized and “target language” for the language being called from the host. The target language itself can also call other supported languages, thus the whole system becomes poly-polyglot.

Running polyglot applications

You need to have GraalVM installed. Use the provided downloads for your system or SDKMan! if available.

Additional languages needs to be installed with gu. gu is a GraalVM tool called the GraalVM Component Updater.

My system looks like this:

➜  echo $GRAALVM_HOME
/Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.1.0/Contents/Home
➜  echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.1.0/Contents/Home
➜  java --version
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02, mixed mode, sharing)
➜  $GRAALVM_HOME/bin/gu list
ComponentId     Version    Component name      Origin
---------------------------------------------------------
graalvm         20.1.0     GraalVM Core
R               20.1.0     FastR               github.com
llvm-toolchain  20.1.0     LLVM.org toolchain  github.com
native-image    20.1.0     Native Image        github.com
python          20.1.0     Graal.Python        github.com
ruby            20.1.0     TruffleRuby         github.com

gu must be in the path. Additional languages like JavaScript or Ruby as well as the native-image tool can be installed as follows: gu install ruby or gu install native-image.

Thanks to single-file source-code programs possible with Java 11, the following Java program:

import org.graalvm.polyglot.*;class Polyglot {
    public static void main(String[] args) {
        Context polyglot = Context.create();
        Value result = polyglot.eval("js", 
            "[10,10,20,2].reduce((a,v) => a +v)");
        System.out.println(result.asInt());
    }
}

can be run on GraalVM with java Polyglot.java and prints happily 42, executing the embedded JavaScript.

There are native launchers for other host languages. They need to be run with the polyglot options like js --polyglot --jvm in the case of JS when you want to call other target languages than the host language.

Scenarios

The need for doing polyglot programming often boils down to the fact that something is missing in the language you actually want to solve your problem in. For example, a language can be especially good at doing analytics or have superb libraries for doing so, like the R language, but is missing a library to connect to your favorite Graph database.

Sometimes a dynamic script language makes developing much easier than a compiled language like Java. That’s often the case for scripted stored procedures in a database. Oracle Database has PL/SQL for example which makes dropping in a new function to the database really easy. I would love to have something like it in the form of Ruby or JavaScript inside Neo4j.

Depending on the use case above the question is as follows: Do I bring a library from language X into language Y or do I bring another language into my host runtime?

Bringing a Java library to a supported target language

That’s one of the first things I was confronted with when working for Neo4j with GraalVM apart from native image: “So we don’t have an R driver yet, can I connect to Neo4j from R nevertheless?” Yes, you can.

It’s possible to use Neo4j Java driver from Ruby, R, or Python to connect to the Neo4j database

The GraalVM polyglot context provides on the supported target languages a “Java” construct that allows access to classes. Instances of Java objects are accessed in the syntax of the host language. Truffle is taking care of converting values correctly.

In the case of R it looks like this (example taken from here). The script should use the Neo4j Java Driver to connect to a database running on the local host and execute a Graph query.

graphDatabase <- java.type('org.neo4j.driver.GraphDatabase')
authTokens <- java.type('org.neo4j.driver.AuthTokens')
config <- java.type('org.neo4j.driver.Config')# This is a call to the static factory method named `driver`
driver <- graphDatabase$driver(
    'bolt://localhost:7687',
    authTokens$basic('neo4j', 'secret'),
    config$builder()
        $withMaxConnectionPoolSize(1)
        $build()
)findConnections <- function (driver) {query <- '
        MATCH (:Person {name:$name})
          -[:ACTED_IN]->(m)<-[:ACTED_IN]-(coAct)
        RETURN DISTINCT coAct
    '    session <- driver$session()
    # The R list (which behaves like an associative array) is
    # automatically converted to a Java Map
    records <- session$run(query, list(name="Tom Hanks"))$list()coActors <- list()
    i <- 1
    for (record in records) {
        coActors[[i]] <-record$get('coAct')$get('name')$asString()
        i <- i + 1
    }    session$close()
    return(coActors)
}connections <- findConnections(driver)for(connection in connections) {
    print(connection)
}driver$close()

The JavaScript, Ruby and Python examples look very similar. Apart from the fact that one has to deal with how to initialize the driver, one can stay in an ecoystem being used too.

Hosting another language in Java

Neo4j can be extended with custom stored procedures. They must be written in Java and to install or upgrade them, Neo4j must be restarted. Wouldn’t it be nice to able to use scripts instead?

Running custom stored procedures written in JavaScript in the Neo4j database running on GraalVM

With the GraalVM SDK in place (org.graalvm.sdk:graal-sdk) it is actually super easy to do this and the Neo4j stored procedure doesn’t look that much different than Polyglot.java above:

import java.io.IOException;
import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Path;import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.procedure.Context;
import org.neo4j.procedure.Description;
import org.neo4j.procedure.Name;
import org.neo4j.procedure.Procedure;public class ExecuteJavaScript {    @Context
    public GraphDatabaseService db;    @Procedure(value = "scripts.execute")
    @Description("Executes the script at the given URL.")
    public void execute(
        @Name("scriptUrl") String scriptUrl
    ) throws IOException {        var uri = Files.readString(Path.of(URI.create(scriptUrl)));
        try (var context = org.graalvm.polyglot.Context.newBuilder()
            .allowAllAccess(true).build()
        ) {
            var bindings = context.getPolyglotBindings();
            bindings.putMember("db", db);            context.eval("js", uri);
        }
    }
}

You see a bit of Neo4j code here. Especially for our purpose is the injected GraphDatabaseService db. This service provides access to the Neo4j API: Running Cypher or finding and traversing nodes. That service is put into the polyglot bindings to be accessed from target languages with bindings.putMember("db", db). Again, Truffle takes care of converting this complex thing in such a way that it can be accessed from the target language.

An example script that called through that function might look like this:

const collectors = Java.type('java.util.stream.Collectors')function findConnections(to) {
    const query = `
        MATCH (:Person {name:$name})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActor)
        RETURN DISTINCT coActor`    const db = Polyglot.import("db")
    const tx = db.beginTx()    const names = tx.execute(query, {name: to})
        .stream()
        .map(r => r.get('coActor').getProperty('name'))
        .collect(collectors.toList())
    tx.close()    return names
}names = findConnections('Tom Hanks')
names.forEach(name => console.log(name))

const db = Polyglot.import("db") shows how to access the service we have put into the bindings earlier.

How to call this in Neo4j? With a Cypher statement: CALL scripts.execute('file:///path/to/script.js') (Cypher is our query language).

This repository contains the full code.

Bringing Java over to C

The full example of this section can be found here: neo4j-java-driver-native-lib.

While it is absolutely possible to use GraalVM polyglot from C directly, you might want to create a shared library and thus combining the polyglot approach with native-image. Checkout the GraalVM reference manual about the polyglot API for C.

We have heard a lot about native-image and most of the time, it is about creating actual executables. But with the command line switch --shared the tool is able to create shared C libraries which opens up a whole new world of polyglot interaction.

You can take anything that runs on GraalVM native image — either Java or one of the supported target languages — and build a library that can be used in C and C# programs or anything that allows foreign-function-interfaces (FFI).

How does that work? Again, I’m working on this from a Neo4j perspective. Say you want to be able to call a function that takes authentication details to a Neo4j server and a query and just print the result.

It would look like this:

import org.graalvm.nativeimage.IsolateThread;
import org.graalvm.nativeimage.c.CContext;
import org.graalvm.nativeimage.c.function.CEntryPoint;
import org.graalvm.nativeimage.c.type.CCharPointer;public final class DriverNativeLib {    @CEntryPoint(name = "execute_query_and_print_results")
    public static long executeQueryAndPrintResults(
        IsolateThread isolate, 
        CCharPointer uri, CCharPointer password,
        CCharPointer query
    ) {
        // Some interaction with Neo4j
        return 4711L;
    }
}

The imports are coming from the org.graalvm.sdk:graal-sdk again. The @CEntryPoint defines an entry point into the DLL that will be generated by native-image.

The build process will generated the following files:

driver-native-lib-0.0.1-SNAPSHOT.jar
generated-sources
graal_isolate.h
graal_isolate_dynamic.h
libneo4j.dylib
libneo4j.h
libneo4j_dynamic.h

Calling this from a C program is pretty simple:

// Tested with
//   Apple clang version 12.0.0 (clang-1200.0.32.2)
//   Target: x86_64-apple-darwin19.6.0
// Compile from the project roots with
//   gcc -Wall -Ltarget -Itarget target/libneo4j.dylib src/main/c/executeQueryAndPrintResults.c -o target/executeQueryAndPrintResults
// And run as
//   target/executeQueryAndPrintResults#include <stdio.h>
#include <stdlib.h>
#include "node.h"
#include "libneo4j.h"int main(void) {    graal_create_isolate_params_t isolate_params;
    graal_isolate_t* isolate;
    graal_isolatethread_t *thread = NULL;    int ret = graal_create_isolate(&isolate_params, &isolate, &thread);
    if( ret != 0) {
        fprintf(stderr, "graal_create_isolate: %d\n", ret);
        exit(0);
    }    int count = execute_query_and_print_results(thread, "bolt://localhost:7687", "secret", "MATCH (m:Movie) RETURN m");
    fprintf(stdout, "Number of movies printed: %d\n", count);    if (graal_detach_thread(thread) != 0) {
        fprintf(stderr, "graal_detach_thread error\n");
        return 1;
    }
}

And just like that, I am able to call into Java from a C program via a shared C library. The documentation about the C-API of GraalVM can be found under Native Image C API. It is a bit sparse compared to the rest, but it does the job.

The source code repository linked at the beginning of this section also contains code to use the generated library from standard Ruby. Why is this useful when there is already a GraalVM Ruby? Because of the freedom of choice. Some people might want to stick to standard Ruby for various reasons and cannot used the “standard” polyglot means of GraalVM here.

Thanks to some great support from Aleksandar of the GraalVM team, I was able to dust off my C knowledge from university. Printing results from the JVM doesn’t make too much sense. I would like to return data from it to the the C-world.

For this, I define a C struct like this:

typedef struct c_node_struct {
    long id;
    char *label;
    char *name;
} c_node;

This is of course far from complete, but does the job. We need to get this into the JVM of course. This is done again via the GraalVM-SDK and one additional library, the SubstrateVM parts of Graal (living under org.graalvm.nativeimage:svm):

package org.neo4j.examples.drivernative;import java.util.Collections;
import java.util.List;import org.graalvm.nativeimage.c.CContext;
import org.graalvm.nativeimage.c.struct.CField;
import org.graalvm.nativeimage.c.struct.CPointerTo;
import org.graalvm.nativeimage.c.struct.CStruct;
import org.graalvm.nativeimage.c.type.CCharPointer;
import org.graalvm.word.PointerBase;import com.oracle.svm.core.c.ProjectHeaderFile;@CContext(DxriverNativeLib.CInterfaceTutorialDirectives.class)
public final class DriverNativeLib {    static class CInterfaceTutorialDirectives implements CContext.Directives {        @Override
        public List<String> getHeaderFiles() {
            return Collections.singletonList(
                ProjectHeaderFile.resolve(
                    "org.neo4j.examples.drivernative", "node.h")
            );
        }
    }    @CStruct("c_node")
    interface CNodePointer extends PointerBase {        @CField("id")
        void setId(long id);        @CField("label")
        CCharPointer getLabel();        @CField("label")
        void setLabel(CCharPointer label);        @CField("name")
        CCharPointer getName();        @CField("name")
        void setName(CCharPointer name);        CNodePointer addressOf(int index);
    }    @CPointerTo(CNodePointer.class)
    interface CNodePointerPointer extends PointerBase {
        void write(CNodePointer value);
    }    private DriverNativeLib() {
    }
}

We see a class which defines a context for the shared image (via @CContext). It uses directives to import the header file for the struct. Based on this and a marker interface PointerBase we can define the Java pendant to our struct. The interfaces must not be implemented by us. Little surprising, we find pointers and pointers to pointers, the later for dealing with arrays the C way.

How to use this? Well, it definitely feels like polyglot programming. Polyglot in the sense that we need to do things in Java I thought I left in the C-world:

public final class DriverNativeLib {    @CEntryPoint(name = "execute_query_and_get_nodes")
    protected static int executeQueryAndGetNodes(
        IsolateThread thread, CCharPointer uri,
        CCharPointer password, CCharPointer query,
        CNodePointerPointer out
    ) {        // Some magic to connect to Neo4j
        // more magic to retrieve nodes        List<Node> nodes = Collections.emptyList();        CNodePointer returnedNodes = UnmanagedMemory.calloc(
            nodes.size() * SizeOf.get(CNodePointer.class));        int cnt = 0;
        for (Node node : nodes) {            CNodePointer cNode = returnedNodes.addressOf(cnt++);
            cNode.setId(node.id());            // Even more magic to retrieve does things from the node
            String firstLabel = "getLabel";
            String nameAttribute = "getName";            cNode.setLabel(toCCharPointer(firstLabel));
            cNode.setName(toCCharPointer(nameAttribute));
        }
        out.write(returnedNodes);
        return cnt;
    }    private static CCharPointer toCCharPointer(String string) {
        byte[] bytes = string.getBytes(StandardCharsets.UTF_8);
        CCharPointer charPointer = UnmanagedMemory.calloc(
            (bytes.length + 1) * SizeOf.get(CCharPointer.class));        for (int i = 0; i < bytes.length; ++i) {
            charPointer.write(i, bytes[i]);
        }
        charPointer.write(bytes.length, (byte) 0);
        return charPointer;
    }    @CEntryPoint(name = "free_results")
    protected static void freeResults(
        IsolateThread thread, CNodePointer results,
        int numResults
    ) {
        for (int i = 0; i < numResults; ++i) {
            UnmanagedMemory.free(results.addressOf(i).getLabel());
            UnmanagedMemory.free(results.addressOf(i).getName());
        }
        UnmanagedMemory.free(results);
    }
}

After we establish a connection to Neo4j — which is cool but irrelevant to this example (check out the repo, though)) — we must allocate memory: UnmanagedMemory.calloc(nodes.size() * SizeOf.get(CNodePointer.class)). Unmanaged memory, just like this. And pretty much the same way you would do in C for an array of a given type. The same applies for the strings we are gonna return (toCCharPointer creates sparkling integers (kudos to my ex colleague Chris Vest for this beautiful nickname to pointers)).

Of course all that memory must be freed afterwards and that’s where freeResults comes in.

Calling this — the ceremony for creating the thread isolation omitted — looks like this:

// Prepare an output pointer
c_node *nodes;
int numResults = execute_query_and_get_nodes(
    thread, "bolt://localhost:7687", "secret",
    "MATCH (tom:Person {name: \"Tom Hanks\"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom, tomHanksMovies",
    &nodes);int i;
for (i = 0; i < numResults; i++) {
    fprintf(stdout, "(%ld:%s name:%s) \n", nodes[i].id, nodes[i].label, nodes[i].name);
}free_results(thread, nodes, numResults);

Quintessence

I presented three ways of polyglot programming on the GraalVM. Two of them make use of GraalVM’s polyglot features from a host or start language and diverge from there on into a target language. In the first example the host languages has been R and the repository contains examples for Ruby, JavaScript and Python, too. All of those can be compiled into native executables as well. While the supported feature range may varies, calling Java based libraries works neat out of the box. In the second example, the host language has been Java and the host passed on some arbitrary complex types into the target language. The object has been a service that allows to access the Neo4j Graph database in full and the target language is JavaScript.

Bear in mind that in both scenarios are possible poly-polyglot. A target language can call other supported languages, too.

I am fully convinced that both scenarios described here works great and fulfil needs that are not far fetched but legit real world use cases: In the first example I can use a library that is probably not available on all language or has the most complete feature set on Java. The Truffle framework does a fantastic job converting collections, maps and struct like data types into something the host language can understand and work with.

The second examples allows scripting languages in software systems running on the JVM with ease. Oracles Database has had PL/SQL for a long time and now uses the embedded GraalVM to allow user defined functions in many languages (for comparison, look at “Bringing Modern Programming Languages to the Oracle Database with GraalVM”). This is exactly the polyglot scenario I showed with my Neo4j script procedures.

Shared library of the Neo4j Java driver built with GraalVM native image can be used to from other languages like C or Ruby to connect to the Neo4j database.

The third exampled demonstrates how a Java library can be turned into a shared C-library exposing entrypoints accessible directly from C or any language like Ruby or Rust that can use FFI. The effort to do this is higher than in the previous two examples.

My knowledge of C, all the manual wrangling with pointers and what not is best described as rusty. Nevertheless, I was able to create at least a running and functional proof of concept.

While I like the possibilities, I would recommend having a look first at your team if not only the need to do this is there but also the knowledge to drive this beyond a pure PoC.

If you can do without a shared library based on Java directly, you will be pleasantly surprised that option one or two will also work from C as a start language. With that option, you could develop a C-library directly in C, calling Java or whatever you like, and publish that.

The building blocks of GraalVM feels a lot like lego bricks and behave much the same way as I want our Spring Data Neo4j stack to behave or how my friend Michael Hunger thinks about our graph model: They are composable in many different ways and the sum of it provides the actual value.

Either way you go, my experience with Graal and the many things related to it has nothing but stellar over the last 3 years.

While I didn’t understand much when I heard the first time about it on JCrete 2017, I saw a constant development of tooling, documentation and infrastructure. GraalVM is an amazing technology, and it opens amazing JVM libraries up to a whole new range of ecosystems. It is definitely more than just Java programs that starts now as fast as originally native programs. I guess that something like Graal will have an impact that is not yet fully visible and that maybe is as big as the JVM itself 25 years ago.

I would be happy if you check out the repositories linked in the beginning of this post. If they are helpful, let me know. May this post also get’s you into the world of Neo4j. If so, you can head over to our publication: neo4j.

Thanks for your time, happy coding and until the next time.