HPy: binary compatibility and API evolution with Kiwisolver

Štěpán Šindelář
graalvm
Published in
7 min readSep 29, 2022

In our previous post we took a look at the porting process from CPython API to HPy using the Matplotlib Python package as an example. In this post, we are going to take a look at the binary compatibility and API evolution while maintaining it, this time using the Kiwisolver Python package as an example.

HPy is an alternative to the standard CPython C API for Python extensions. One of our previous blog posts gives a short introduction to HPy.

Kiwisolver provides a Python binding for an efficient C++ implementation of the Cassowary constraint-solving algorithm and it is a dependency of the Matplotlib package.

The process of porting Kiwisolver to HPy was the same as for Matplotlib. In this blog post we are going to focus only on the binary compatibility and API evolution aspects of HPy.

API vs ABI primer

When we talk about binary compatibility, it is important to understand the difference between API and ABI. If you are already familiar with those two terms, you can skip this section.

API, application programming interface, defines the interface for communication between two systems at a source code level. It consists of, for example, names of exported C functions and their argument types at the source code level, but not their concrete machine representation.

ABI, application binary interface, defines this interface at the binary level, i.e., at the machine code level. It consists of, for example, exact memory layout of used structures, or machine specific types of arguments of exported functions.

Let us consider this example: a new field is added at the beginning of the PyObject C structure. This does not change the API, existing Python extensions will continue to work, but only after recompilation. This is because the memory layout of PyObject structure, more specifically its fields offsets, have been hard-wired into the Python extension binary at the compile time. This change has broken ABI compatibility.

The important implication of ABI stability is that existing binaries do not need to be recompiled if ABI stays the same or is changed in an ABI compatible manner. For example, if we add the new field at the end of the PyObject structure, then the offsets of the pre-existing PyObject structure fields do not change and we would not break the ABI compatibility.

Introduction

Unlike Matplotlib, Kiwisolver does not depend on the Numpy API, which is not yet ported to HPy; therefore, we could fully remove any legacy CPython C API calls from Kiwisolver and produce an HPy “universal” binary distribution capable of running on any Python implementation that supports the HPy API: CPython 3.8+, GraalPy, and PyPy. One binary, no recompilation, multiple Pythons!

What does this look like then? We will build our HPy port of Kiwisolver using the “universal” HPy ABI:

python setup.py --hpy-abi=universal build_ext

The command builds a binary called kiwisolver.hpy.so and also file named kiwisolver.py. The latter takes care of loading the HPy extension, such that one can just import it by calling

import kiwisolver

Below is a simplified version of the code in kiwisolver.py:

The hpy.universal.load builtin comes from the HPy package, which is available on PyPI and provides the bridge between HPy and CPython. Other supported Python implementations, such as GraalPy or PyPy, come with their own implementation of the HPy interface that bridges the HPy API to their internals. In other words, a HPy implementation is part of the their core and the hpy.universal.load builtin is available without any external extensions.

How it works

Let us compare what is inside kiwisolver.hpy.so and kiwisolver.cpython-38d-x86_64-linux-gnu.so, which we would get with the standard Python C API. We will use the nm utility, which can list all the symbols in a shared library, and its option -u, which lists only undefined and therefore required symbols by our shared library.

$ nm -u ./kiwisolver.cpython-38d-x86_64-linux-gnu.so | grep Py
U PyBaseObject_Type
U PyBool_Type
U PyBytes_FromStringAndSize
U _Py_Dealloc
...

As we can see, the shared library requires many symbols provided by CPython, such as PyBytes_FromStringAndSize. What about the HPy binary:

$ nm -u ./kiwisolver.hpy.so | grep Py $ # no output

The HPy universal binary does not depend on the CPython API. If we inspected all the undefined symbols, we would discover that there is nothing HPy specific either. How can HPy work then? The binary contains a function HPyInit_kiwisolver that takes as its first argument HPyContext* - a pointer to a structure with pointers to the implementations of the HPy functions. Python will lookup the HPyInit_kiwisolver function by its name, create the HPyContext structure and call HPyInit_kiwisolver with it.

Here is a simplified skeleton of how this would look like in code:

Note: in practice the HPyInit_kiwisolver is generated by the HPy_MODINIT macro and it calls into init_kiwisolver_impl(HPyContext *ctx), which is implemented by the user and should do the actual work.

Because it would be annoying to always have to write ctx->foo(ctx, ..arguments..), HPy header files provide simple inline C functions, such as HPy_IsTrue:

but because those functions are inlined into your extension, they are in fact not a binary dependency. The only binary dependency is the layout of the HPyContext structure!

On the other hand, if you build the HPy extension in the “CPython ABI mode”, HPy will include different implementations of those helper functions. For example, HPy_IsTrue becomes this simple function that just forwards to PyObject_IsTrue:

After the compiler is done there should be no traces of HPy_IsTrue and your extension would directly call PyObject_IsTrue just like with the CPython API. But let’s go back to the universal mode.

But The Times They Are A-changin’

What if a newer HPy version adds some new API, for example, if it adds function pointers at the end of the HPyContext structure? No problem! If the extension is expecting the previous "shorter" HPyContext version, it will not see any difference and continue to work as before. No recompilation necessary.

What if HPy needs to make a breaking change, e.g., change the semantics or signature of one of the API functions? For example, in an attempt to get closer to JavaScript, we’d like to change HPy_Absolute to also accept a string and attempt to convert the string to a number. However, older extensions may rely on HPy_Absolute raising an exception in such a case. Normally, a change like this would require a painful process of updating existing packages and then eventually actually changing HPy_Absolute. Packages that are not actively maintained and tested on pre-releases of CPython are out of luck.

Although this is not yet implemented, it is conceptually not a problem for HPy. We expect that every extension would alongside the HPyInit_{name} function expose some other “pre-init” function, such as HPyPreInit_{name}. HPyPreInit_{name} is not going to take an HPyContext* argument, but it will communicate to the Python engine which HPyContext version the extension expects and maybe few other things, such as, for example, whether it supports sub-interpreters.

With this, the Python engine can conceptually have two instances of the HPyContext structure and can initialize two HPy extensions that have different HPy version expectations. For example, a legacy package expects the old version and a new shiny package wants to get absolute numeric value of strings, i.e., the new version. The sketch of what the Python runtime would do in C code:

Imagine that MyLegacyPackage is some 10 year old Python package whose author has disappeared since then and no one knows where its sources are, but you really, really want that new fancy package, which needs new HPy version. No worries with this approach! We can run multiple versions of HPy within one process and load each package with a different HPy version!

Demo time

Enough of the theory, let’s see this in action. Remember we built the Kiwisolver HPy port in universal mode:

python setup.py --hpy-abi=universal build_ext

We can load the same Kiwisolver native extension in both CPython and GraalPy:

$ python -c 'import kiwisolver; print(kiwisolver.Solver())'
<kiwisolver.Solver object at 0x7f5ca447b400>
$ graalpy -c 'import kiwisolver; print(kiwisolver.Solver())' <kiwisolver.Solver object at 0x2b12488>

How can you reproduce this at home? You will need:

First, install HPy for CPython. This should be done the same way as one would normally install other PyPI packages, such as NumPy. For example:

python -m pip install hpy

You will be building Kiwisolver from sources, so you need the setuptools package and the pytest package to run its tests. Those will be likely already installed on your system, but just in case:

python -m pip install setuptools pytest

You can proceed to build the Kiwisolver. Change the current working directory to the root of the Kiwisolver repository and run:

$ cd /the/local/clone/of/kiwisolver
$ python setup.py --hpy-abi=universal build_ext

Now, staying in the same working directory, you can run Kiwisolver tests:

$ PYTHONPATH=. python -m pytest py/tests

How can you run the same package with GraalVM Python, aka GraalPy? Decompress the GraalVM distribution somewhere on the disk. Let us say that $GRAALVM_HOME is the path to the decompressed directory. First, install GraalPy:

$ GRAALVM_HOME/bin/gu install Python

It is best practice to first create a venv for GraalPy, so that it does not interfere with your system Python installation. Then, install the pytest package (setuptools always come with GraalPy preinstalled). The pip tool works on GraalPy as it does on CPython.

$ GRAALVM_HOME/bin/graalpy -m venv /path/to/graalpy/venv
$ /path/to/graalpy/venv/bin/graalpy -m pip install pytest

Now you can simply run the same tests, using the same Kiwisolver build produced by CPython:

$ cd /the/local/clone/of/kiwisolver
$ PYTHONPATH=. /path/to/graalpy/venv/bin/graalpy -m pytest py/tests

As an exercise for the reader: you can try it the other way around. Build the package with GraalPy and run the result on CPython. You can even put PyPy into the mix! That’ll make a few more combinations to test!

Summary

We’ve explored how HPy was designed for binary compatibility and evolution by using the HPyContext struct with pointers to functions that implement the API. The ABI of HPy is manifested as the layout of the HPyContext struct and types of its pointers. Concrete implementation of the HPyContext is passed to the extension entry points as an argument, which allows for a great flexibility. For example, it is possible to load and run two extensions requiring different HPyContext layouts or semantics in one process.

This approach taken by HPy builds on the experience of other mature projects.

(*) Unless they do something very dangerous such as, for example, copy and paste the PyObject structure definition into their sources, or rely on its hard-coded size.

--

--

Štěpán Šindelář
graalvm
Writer for

Technical lead of the R (“FastR”) runtime of GraalVM developed by Oracle Labs.