This is a guest blog post by Franziska Geiger. She shares the experience of working on the GraalVM Python implementation during her recent internship with the Oracle Labs GraalVM team in Zurich.
The GraalVM Python implementation is an open source project, hosted by Oracle Labs. The project implements a Python interpreter in Java using the Truffle API to run on the GraalVM. Right now, this implementation doesn’t support all Python functionalities, but it can be freely extended by contributors on GitHub. To contribute to the project a base understanding of the GraalVM ecosystem is necessary as well as an understanding of the projects components and architecture. Gathering that knowledge by crawling through the code is a time consuming and overwhelming task for beginners. This article guides step by step through the project and provides an easy access point to dig deeper into the code. The post explains concepts relevant to how Python on GraalVM works, what its main components are, and how to start contributing. It also explains useful commands for debugging and testing to have a more efficient and smooth coding experience.
The GraalVM Python implementation
This is a completely new implementation of Python. The goal is to create a fast Python implementation integrated with the GraalVM compiler. The GraalVM compiler is implemented in Java. To implement new languages on GraalVM, a framework called Truffle is used. Truffle provides an annotation based DSL to easily specify an optimizable abstract syntax tree of a language. The Python implementation uses multiple other GraalVM projects as can be seen in the following graphic. Every square represents an individual project.
GraalVM is an equivalent to a Hotspot virtual machine using an alternative compiler, called GraalVM compiler, alongside the normal Java compiler. This is the core component for all GraalVM projects. The just-in-time compiler uses advanced techniques to improve the performance of Java code. On top of it to make it easier to execute other languages on GraalVM, the Truffle API is used to interpret a language from an abstract syntax tree (AST). Truffle provides an easy way to define the AST nodes and automatically delivers the AST to GraalVM. For detailed information about the two projects check this blog post.
Python, too, is implemented using Truffle ASTs to interpret Python code. Besides the Python AST, the Python project also heavily depends on the Sulong project, used to interpret LLVM bitcode. This output format of the clang compiler is used to run Python libraries that contain C Extensions. These modules are parsed and run using Sulong, and they interact with the GraalVM Python implementation through a custom C API implementation that uses Truffle language interop.
All these projects; the GraalVM compiler, Truffle, Sulong, and Python use
mx as build tool.
mx loads third party artifacts, manages dependencies to other projects, and invokes the compilers in the right order.
Setup the environment
To get started with the GraalVM Python implementation its parent project has to be set up and GraalVM source code must be available. Check out the mx repository and add
PATH as well as check out GraalVM, and set up the VM as described in its README, before starting to work on Python.
The Python component can be cloned from GitHub:
$ git clone https://github.com/graalvm/graalpython.git
Check out the project in same base directory as GraalVM repository is checked out. You will also need to download a JVMCI-enabled JDK which is available from https://github.com/graalvm/openjdk8-jvmci-builder/releases. Put the path to it as your
JAVA_HOME environment variable.
To set up the repository for usage in an IDE use the following command:
$ mx ideinit
This will initialize all necessary configurations for Eclipse, IntelliJ, and Netbeans. After that you can open the project in your IDE, and build it there or using
$ mx build
If everything is set up correctly, the command will finish successfully and the Python implementation is ready to use. The first build can take a few minutes, because all the depending projects have to be built as well.
mx automatically does incremental builds however, so subsequent builds are quicker.
To run the GraalVM Python implementation the command provided by
mx can be used and a standard Python REPL is started.
$ mx python
Python 3.7.4 (Mon Aug 19 17:46:44 CEST 2019)
[Interpreted, Java 1.8.0_222] on linux
Type "help", "copyright", "credits" or "license" for more information.
Please note: This Python implementation is in the very early stages, and can run little more than basic benchmarks at this point.
The command acts like the normal Python interpreter and can also execute Python scripts as far as they are currently supported. To make the executable easier to use we can create a virtual environment. The environment encapsulates the Python implementation execute script and calls its main entry class GraalPythonMain with the required flags and dependencies.
To create and activate a virtual environment call this:
$ mx python -m venv <dir-to-venv>$ source <dir-to-venv>/bin/activate
Now you can use Python from console without the
mx command. Also, the
python command now launches the GraalVM Python implementation. You can deactivate the virtual environment by running the
As last step libraries can be installed. At the time of this writing,
pip is not functional in the GraalVM Python implementation. Instead it provides an alternative tool called
To see libraries that are tested and known to work to an extent, type
$ python -m ginstall list
To install, for example, the NumPy package, call this
$ python -m ginstall numpy
That’s it, now the GraalVM Python implementation is ready to use and coding can begin.
GraalVM Python implementation components
To work in the project it’s necessary to understand how the project works. Following graphic shows the processing and base components of the implementation:
To execute Python code on GraalVM the plain sources have to be parsed and interpreted. For parsing a library called
ANTLR is used, which is a parser generator. The parser has to translate the code string into an abstract syntax tree consisting of Truffle nodes. This translation is implemented in the
parser package. The nodes the parser creates represent different syntactical elements and operations. These are nodes which represent the language grammar, like for instance an IfNode or an ArithmeticOperationNode. These nodes are defined in the nodes package. Python also comes with a lot of built-in modules, to handle for instance collections or operating system operations. These operations are also implemented in Truffle nodes, but they are not generated as part of parsing, but are instead instantiated at runtime. These are defined in the builtins package and often use lower level helper nodes from the node package to implement their methods. Not all Python builtin methods are implemented, so you won’t find all Python builtin modules available, yet. The runtime package contains classes for handling the interpreter and execution objects’ state. It contains for example the base language class PythonLanguage, as well as other objects, like different storage objects or exception types.
Writing a builtin method
The easiest way to enter the the GraalVM Python world is writing a new method of a builtin module. Python provides a big set of builtin modules, and not all their functionality is implemented for GraalVM right now. The usual way of finding missing builtin methods is executing some existing Python scripts or modules and during that hit unimplemented methods. If a method is missing, an error similar to this is thrown:
AttributeError: 'module' object has no attribute 'eq'
By reading the source code that caused the error, we learn which module this method was expected on. The builtin method which has to be implemented is the equals method of the operator module “operator.eq(a,b)”. To get a better understanding of what the method is supposed to do, checkout the official Python documentation or the corresponding CPython implementation.
GraalVM Python implements each builtin module in a separate Java class. The class for the operator module is already defined. It is called “OperatorModuleBuiltins” and located in builtins.modules package. In addition to the builtin modules also builtin objects exist, which are implemented in the second sub folder of the builtins package: builtings.objects. In there methods for different kind of objects are implemented, for instance the __getitem__ method of an array. To implement the missing method we add a subclass of PythonBuiltinNode as an inner class of the OperatorModuleBuiltins class, like this:
This defines a new method in operator module by using Truffle. The Truffle API is heavily annotation-based and auto generates a lot of code during build time. The following annotations are used in the code sample above:
builtinannotation identifies this class as a new builtin method, it defines the method name, which is in this case “eq” and a minimal number of positional arguments, which is in case of the equals method is two. Since there’s no maximum number of positional arguments defined, the method takes no optional arguments.
@GenerateNodeFactoryThis annotation tells Truffle to auto generate a factory for this node class, so it can be automatically created for building the Truffle AST.
@SpecializationThese annotations are used to define the execution methods of the node. A node can have multiple specializations, whereby each specialization captures another set of possible input types. The specializations have to be ordered from specific to general, so the last specialization should always cover the most general case. This is usually inputs, which all have the type object. If there is no matching specialization given for a set of inputs, an
UnsupportedSpecializationExceptionis thrown. This indicated the developer there is an unhandled set of input parameter types, which must be handled somehow. After a missing module method exception, this is the most frequent and easiest to solve exception in GraalVM Python. The specialization methods are called from auto generated classes created by Truffle, in this case
EqNodeGen. The specializations mechanism is useful to give the compiler more information about its inputs and provides faster ways to calculate the result for frequent cases. In case of equals we write specializations for primitive type cases, which can be easily calculated. Although in the example above one specialization is missing for the most generic case. The number of implemented specializations depends on the developer, the more specific the functionality is implemented, the faster can it be compiled. Although changing input types and many different specializations require also a lot of deoptimization processing for the compiler and therefore slow it down again.
@TruffleBoundaryThe fourth specialization in the code example above is additionally annotated with
TruffleBoundary. This is necessary if some Java builtin libraries are used. Because Truffle resolves later all the written Java code to AST tree nodes, we have to explicitly tell it to not do that for this specialization and just deliver it to the compiler as one Java node. This is done to prevent the tree from exploding, because some Java functions do a lot of things in background. So a boundary is defined here to use the Java built in method
equalswithout oversizing the truffle AST.
The above implemented specializations cover different specific cases, but there is no option for the most general case. So we have to add another specialization at the bottom of the class:
This method handles object typed inputs and is therefore the most generic case. If the values are not one of the previously defined simple types, we have to call the objects
__eq__ method, which each Python object provides. Therefore we define a child node of this node, which remains uninitialized until the generic case happens. Every node can define any amount of child nodes using the
@Child annotation. An
if node for instance has three child nodes: one
condition node, one
if-clause node and one
else-clause node. When the generic method is entered, a new node has to be added to the AST, to sub call to the
__eq__ method. Therefore the already evaluated tree has to be deoptimized back to the interpreter during compile time, to add the new node. This is what the
CompilerDirectives.transferToInterpreterAndInvalidate() method indicates. After that we can add the node and the AST is evaluated again in its new state. When the method is called the second time, the child node is already available and we can skip the deoptimization. The method is annotated with
@Fallback instead of
@Specialization, which means it is the negated case of all the other previously specified cases. There are a lot of nodes already implemented, so it is mainly a game of finding the right node to do the right thing. In this case the
BinaryComparisonNode will do the job. Some nodes have for this purpose already a static create method implemented. The
BinaryComparisonNode takes a method name and a reverse method name, which is in case of equals exactly the same:
__eq__ and the corresponding operator sign.
To generate the factories and specialization resolving methods of truffle, we call
mx build, which will rebuild the module with the new node classes. The build should log an entry about compiling the
Test and debug
When the module method is completely implemented, a test case can be added to the test environment. The existing test cases can be found in a test folder next to GraalVM Python implementation src folder. Every module has a
test_<module_name>.py class, containing test cases. The tests are written in Python or C for native code. For testing the new method add a test case to class test_operator.py:
To check if the test passes, either the whole test environment can be executed by:
Or it is possible to pick a particular test class, in this case the test_operator.py class with this command:
mx python3 graalpython/com.oracle.graal.python.test/src/graalpytest.py -v graalpython/com.oracle.graal.python.test/src/tests/test_operator.py
If there is an error in the code, debugging is useful to step through the program to see what happens. GraalVM Python provides two possibilities to debug. Either step through the Python code, using Google Chrome’s inspector tool or step through the AST nodes in Java, using the IDE’s built-in debugger.
For debugging Python code in Chrome pass the flag
— inspect to the GraalVM Python command, for instance to debug through a test Python class execute:
mx python3 --inspect testfile.py
The command will open a debug server, which can then be accessed via Chrome. Therefore go to
chrome://inspect in the browser. There should be an entry for GraalVM, which leads to the debug environment.
To debug the Java code add following flags:
mx -d python3 testfile.py
This allows to step through the Java implemented nodes. The debugger also shows the auto generated nodes, which are all suffixed with Gen. So for debugging it’s a good idea to focus on none-generated nodes, because this is where the error will be. GraalVM Python offers a bunch of other advanced options, to read more about them use the
When the tests all pass, the code can be merged by opening a pull request to GraalPython master. Read more details and further description about it in the repo.
Writing C API methods
The code example above implements a Python-only method. But one of Pythons main benefits are Python C extension modules. These modules have also components implemented in C/C++. So to handle them, a C API is necessary and the Sulong project is used to interpret the code. Following graphic shows how the different components work together:
To explain the process we take NumPy as example module. NumPy is written mostly in C, so besides its Python API it also has a big amount of C code. The C code is compiled into bitcode files using the LLVM compiler clang. These bitcode files can then be interpreted by Sulong.
Because the NumPy C code needs to call back to Python, it uses the CPython C API, which GraalVM Python emulates. The code for emulating the API can be found in the
cext package. The complete API header files are the same as for CPython, to be source compatible. The corresponding C functions are partly implemented in the modules and src subdirectories. There are two ways to add a new C API function implementation in GraalVM Python. As a first option the method can be implemented directly in the C file. For this case check out how CPython implements the method. As second option the method can be implemented in Python or Java, using GraalVM Python polyglot API. To do so the C method has to define a polyglot Upcall to the other language. The upcall can look like this, in the case of the
The upcall needs an ID, which is usually the original function name. Then the actual call can be done using UPCALL_CEXT_O. To unwrap or wrap parameters if necessary, they have to be passed through the native_to_java function. The module, which implements the function is python_cext. It is implemented partially in Python using normal code in lib-graalpython, and partially in Java using nodes for methods as shown in our example above.
GraalVM implementation of Python is a relatively young project with an interesting language implementation approach. The combination of C++ and Python code together with Java as implementation language makes it a versatile project, that provides interesting challenges. Nevertheless to become an expert everybody has to start as a beginner. This getting started post gives an overview of the projects concepts as well as it shows a concrete and complete contribution example. The descriptions should also give an insight of the projects current development state and where contributors can impact the project. Furthermore the reader gets an intuition about how to use the provided API’s and where the approaches trade offs are.
Editor’s note: This blog post was written by Franziska Geiger based on her recent experience of working on the GraalVM Python implementation within the GraalVM Internship Program.