0-copy your PyArrow array to rust

Niklas Molin
8 min readAug 23, 2020

--

pyarrow array to rust

There were two reasons for why I started playing around with the arrow interoperability between python and rust.

Firstly I think the apache arrow project both look promising and interesting. Coming from a data background. I love the development of the file-based formats. How much that has simplified interoperability between applications. And it would be really nice see a similar development on the compute/memory side of things. Making it possible for us data engineers to move between languages and engines without overhead. If you have a high resource demanding custom task. Just move over to a very efficient but still understandable language such as Rust. Still avoiding memory and other overheads.

Secondly I am trying to teach myself rust. I am a complete novice at both things. Thus everything I write below could be completely wrong. Furthermore, I don’t claim that any of this have any application in real life. But apart from practicing my writing skills. I also though it could be helpful for the other person. There should be about one more person trying to learn rust and being curious about arrow, right?

Scope

The first aim was to get a sense of how python -> rust interoperability works. To see if we could get a zero copy transfer of objects. As i turned out the second part proved to be more difficult than anticipated. So I stopped when I’ve at least found a method of transferring a generic array. But before reaching a state where I had a full fledged example application.

Getting started — basic capabilities

To get started we’ll need to basic capabilities. The first is quite obvious, we need to be able to call rust functions in python. The second is that we’ll need to be able to interpret C objects in rust. The reason for the seconds part will be presented later on.

Calling rust functions from python — Pyo3

There is a brilliant crate called Pyo3. That will provide all the capabilities we need and more. This lab is only sending addresses and strings back and forth but it contains a lot more features.
There are macros to define python modules. There is a single macro to expose one of your functions as a python function.
The arguments can be native primitive types when implementing rust functions. But the return value needs to be a pyo3 type.
One little gotcha is that the macros seem to need your python modules functions to be defined in the same rust module. And to get your python module available in python they need to be in the rust lib file.

Interpreting c objects in rust

There is multiple crates that include fancier functionality for interoperability with “c-like” code. The type definitions should probably made use of something like libc. But it is only pointer addresses that is being shared in this lab. And that simply works with normal rust types since they have the same memory layout. We can also get structs directly from c functions. We only need to annotate our rust structs to have a c-like memory outline ([repr(c)]).

There is a couple of examples included in main.rs. Running `make install_rust_binary` installs the rust examples.

$ arrowlab get-struct
Getting a simple struct pointer with three ints x,y,z from c, the values should be 666,7727,2
The result: TestStruct { x: 666, y: 7727, z: 2 }

Setup

Some of the description here are held very short. For more details look into the Python-rust-arrow that also include build and run details.

Possibilites

The arrow project has a module for inter process communication (Arrow IPC). That allows transfer of high level objects out of the box and is supported by both python and rust.
It also has the flight module for RPC call that could be used for the same purpose. Even though it’s perhaps more interesting it isn’t covered here.
Since some time back it also has something called c data interface . That is implemented for some languages. But also is a spec for interoperability.

IPC

The IPC modules are really easy to use both in python and rust. You can both read and write high level objects such as a RecordBatch in both languages.
In the following python snippet. A parquet file is read into a PyArrow table. That is thereafter split into a list of RecordBatches. Those are the written to a IPC BufferOutputStream. That works beautifully.


a = pq.read_table(file_name)
print(“read data, we have now instatiated the data once”)
batches = a.to_batches(max_chunksize=9999999999)
print(f”created {len(batches)} batches, second time we instatiate the data”)
sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream(sink, batches[0].schema)
for batch in batches:
writer.write_batch(batch)
print(“wrote batches, third instanciation”)
writer.close()
return sink.getvalue()

The only problem with this approach is that it will introduce an additional copy of the data. The step where the batches are written to the stream.

To get the data to rust we can simply convert the output stream to a python byte array. The output stream has a method called to_pybytes. The ByteArray can be directly passed to rust as a primitive type. That works but it creates one more instantiation of the data. We can avoid this getting the memory address for the output stream by its property address. And send that int over to rust instead. That avoids to the two extra copies. The drawback is that our rust-function that dereference the pointer will be unsafe.

The rust code to convert it back to rust arrow objects:

The following images illustrates the memory usage. The timeline is simply nonsens. I’ve added 2 seconds sleeps to make images easier to interpret. To run one of the examples and generate the plot:

mprof run —-include-children alab ipc transfer-recordbatch --sleep 2
mprof plot

The test data is a parquet-file generated by running `alab generate-data`. That produces a ~11 MB parquet files with three column. We’ll only transfer the third column that is a random float.

The first part of all images (including the one at the end) is they share the same pattern at the start. We have a sharp increase due to loading of libraries and reading the file. After a 2 second wait they all get the same increase due to a conversion from pandas to RecordBatches. So the first 2,5 seconds can be regarded as a pre-step to the tests.

Byte array

From pointer

The last bump in this curve is IPC process copying the data buffers. It should be possible to avoid. I still haven’t looked into where in the process this happens.

C DATA INTERFACE

As far as I understand the rust implementation could implement the same array structure as the c++ implementation. And we would be good to go but it doesn’t (probably for good reasons). Nor does it implement a structure as the c data interface. Nor the the ability to convert and array on the c data interface structure to a rust arrow array. This is one part where I might very possibly be very wrong.

In this lab, I just completed a POC to give myself a sense of what implementing the conversion would mean.
The structures in the d data interface defines both a schema and a data struct. This solution handles only the data struct. It contains no generic mapping from the c data interface structure to the Rust implementation. Instead we test the transfer by only handle the data struct for a simple float array.

Getting the c data struct

Our first obstacle to get through is that we need to convert the python arrow array to the basic c-structure. There is a _export_to_c method on the pyarrow array. That accepts a couple of arguments in the form of pointers to a data and a schema structs. So we need to create those structs; a few lines of cython (see cython-code.unwrap in the project). There is another module in the project called pcffi.py, that is mearly a copy of the c definitions from the arrow projects. It should have been linked instead.


def get_array(obj):
c_schema = ffi.new(“struct ArrowSchema*”)
ptr_schema = int(ffi.cast(“uintptr_t”, c_schema))
c_array = ffi.new(“struct ArrowArray*”)
ptr_array = int(ffi.cast(“uintptr_t”, c_array))
obj._export_to_c(ptr_array, ptr_schema)
return ptr_array

Creating a rust array

We already know how to get those c-objects into rust. We’ll define corresponding structures and dereference the raw pointer.

#[repr(C)] 
#[derive(Debug)]
pub struct ArrowArray {
pub length: i64,
pub null_count: cty::int64_t,
pub offset: cty::int64_t,
pub n_buffers: cty::int64_t,
pub n_children: cty::int64_t,
pub buffers: cty::int64_t,
pub children: cty::int64_t,
pub dictionary: cty::int64_t,
}

The repr(C) annotation provides us with a struct that has a c-like memory alignment. If we would implement the schema as well we could create a generic mapper to rust arrow arrays. In the example of transferring a float array. We only need a subset of the data-points.
The children and dictionary are only applicable for more complex array types.
The code below is a slightly shorthen version. The derive bufferdefinitions hack is to solve for the rust arrow buffers having memory alignment checks. the checks enforces 128 bit memory alignment on the buffers for x86_64. The c arrays are 64 bit aligned on my system. We work around that by manipulating the address and offset synchronously. The first buffer is always the null bitmap that should be zero here. Hence we are only considering the first.

Implementing the generic version should provide us with a zero-copy version of python -> rust arrow array transfer.
running `mprof run — include-children alab cdata arrayfrompointer --sleep 2` provides the memory graph below.

Conclusion

I would so much love to have some tracking to see i actually anyone makes it down here. Either way, we’ve at least seen that its possible to move your PyArrow array to rust with zero data copy. We don’t know exactly how to do it secure and nicely. And we still don’t know if we could tweak the IPC process to do it for us. But those are questions for another day.

--

--