How to Create Performant Python Modules using Rust

7 min readDec 21, 2021

Python can be fast to write but slow to run. In scenarios where performance is important, developers write ultra fast Rust libraries and bind them to Python.

Background

Writing code in Python usually involves trade offs between developer ergonomics and performance. While it is easy to express ideas in Python, the computer works harder to interpret the developer’s intention into a running process. For most projects this isn’t an issue, since Python code is fast enough to get the job done. But when there is a lot of number crunching involved, like in graphics, cryptography and machine learning, pure Python code will likely be significantly slower than the same logic written in a low level language.

One way to get a boost in Python performance is through a Foreign Function Interface (FFI). FFIs take a low level library, often written in C, and wrap it in an easy-to-use Python module. When the developer calls one of the module’s functions, all of the computations are done in C and the result is translated back to Python. That is how packages like NumPy can perform numerical transformations quickly on large batches: since the data is managed and operated on by C code. So should we write a C library whenever we need high performance in Python? Probably not. Developing memory-safe programs in C is difficult and requires a fair amount of testing.

But thanks to the PyO3 library, Python FFIs can be safely written in the Rust programming language, which has the performance of C with guaranteed memory safety. To better understand how to build a Python package using Rust, we’ll walk through the process of creating a basic NumPy clone called NumRs. Once completed, we’ll test our library against equivalent code in Python and see how well it performs.

Setup

To get started, install these foundational tools for building the library:

Rust
Python
Maturin (command line tool for compiling the Rust/Python library)

Create a new Rust library:

cargo new --lib numrs

Add the necessary Rust dependencies in Cargo.toml:

[package]
name = "numrs"
version = "0.1.0"
edition = "2021"[lib]
name = "numrs"
crate-type = ["cdylib"][dependencies.pyo3]
version = "0.15.1"
features = ["extension-module"]

The PyO3 dependency generates a Python interface for our Rust code. It handles all of the communication between the languages in a nice API. To start developing NumRs using PyO3, head over to lib.rs.

Hello Numerical World

Fundamentally, NumRs needs an Array type to hold numerical data. We can represent this array as a Ruststruct:

// lib.rs
use pyo3::prelude::*;#[pyclass]
struct Array(Vec<f64>);

The #[pyclass] macro binds this rust struct to a Python class. Similarly, Rust implementation blocks tagged with #[pymethods] will expose these methods to the Python interface. To begin using theArray class, we need to specify a constructor so that it can be instantiated. The constructor is a special function in the implementation block that has a #[new] tag:

#[pymethods]
impl Array {
    #[new]
    fn new(arr: Vec<f64>) -> Self {
        Array(arr)
    }}
// In Python: new_array = Array([1,2,3])

Now you can create a new array, but there is no way to see its contents because Python does not know how to print out the Array class. Specify a string representation by overriding the __str__ dunder method:

// Inside impl Array
fn __str__(&self) -> String {
   format!("{:?}", self.0)
}// In Python: print(Array([1,2,3]) -> "[1,2,3]"

Finally, wrap the Rust code into a Python module using a function tagged with #[pymodule] :

// lib.rs
#[pymodule]
fn numrs(_py: Python, m: &PyModule) -> PyResult<()> {
   m.add_class::<Array>()?;
   Ok(())
}

With the basic code in place, we can build the library by running maturin develop on the command line at the root folder of the project. Open the Python REPL to test out the module:

>> from numrs import Array
>> my_array = Array([1,2,3])
>> print(my_array)
[1.0, 2.0, 3.0]

🎉 Oh yeah! Your custom Rust code is now running in Python.

Statistics

Clustering using statistical aggregation

So far, Array doesn’t have any interesting capabilities beyond holding data. Let’s change that! Statistical aggregations are very helpful for summarizing data, so our array should be able to to calculate the mean and median of its contents. Adding these functions to the #[pymethods] implementation block will allow us to call them directly on the Python Array class:

// Inside impl Array
fn mean(&self) -> f64 {
        if self.0.len() == 0 {
            return 0.0;
        }
        self.0.iter().sum::<f64>() / self.0.len() as f64
    }fn median(&self) -> f64 {
        let mut sorted = self.0.clone();
        sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
        match sorted.len() {
            // Empty array
            0 => 0.0,
            // Even number of elements
            len @ _ if len % 2 == 0 => (sorted[len / 2] + 
                               sorted[len / 2 - 1]) / 2.0,
            // Odd number of elements
            len @ _ => sorted[len / 2],
        }
    }

Time to test! Run maturin develop and open the REPL:

>> from numrs import Array
>> arr = Array([1, 5, 12, 26])
>> print(f"Mean: {}", arr.mean()) # Mean: 11
>> print(f"Median: {}", arr.median()) # Median: 8.5

Arithmetic with Arrays

Performing arithmetic directly on arrays instead of writing loops is a super useful abstraction. As a proof of concept, we’ll implement addition and subtraction using the __add__ and __sub__ dunder methods. In both cases, we will perform the element-wise operation if both of the arrays are the same size, and return a new array as a result:

// Inside impl Array
fn __add__(lhs: PyRef<Array>, rhs: PyRef<Array>) -> PyResult<Array> {
        if lhs.0.len() != rhs.0.len() {
            return Err(PyIndexError::new_err(format!(
                "Arrays have differing lengths: {} vs {}",
                lhs.0.len(),
                rhs.0.len()
            )));
        }
        let arr = lhs.0.iter().zip(rhs.0.iter()).map(|(l, r)| l + r).collect();
        Ok(Array(arr))
    }fn __sub__(lhs: PyRef<Array>, rhs: PyRef<Array>) -> PyResult<Array> {
        if lhs.0.len() != rhs.0.len() {
            return Err(PyIndexError::new_err(format!(
                "Arrays have differing lengths: {} vs {}",
                lhs.0.len(),
                rhs.0.len()
            )));
        }
        let arr = lhs.0.iter().zip(rhs.0.iter()).map(|(l, r)| l - r).collect();
        Ok(Array(arr))
    }

There are a few new data types involved in these methods:

PyRef is a safe pointer to an object controlled by Python. Passing around complex objects wrapped in PyRefs is very efficient, because the underlying object is not duplicated. Rust will ensure that these references are valid and that the object cannot be mutated by another function while a PyRef is held.
PyResult is a version of the Rust Result type. It allows the developer to raise Python errors when something goes wrong. In this case, we raise an error when the arrays are not the same length.
PyIndexError::new_err creates an IndexError Python exception with a custom error message.

After compiling the project, Arrays will work with basic operators in Python:

>> from numrs import Array
>> x = Array([1.,2.,3.])
>> y = Array([1.,1.,1])
>> print(x + y)
[2,3,4]
>> print(x - y)
[0,1,2]

Indexing

Currently, an Array can store our data, but we can’t get individual values out of it. By implementing the __getitem__ function, the Array class will be indexable like any other Python list:

// Inside impl Array
fn __getitem__(&self, index: usize) -> PyResult<f64> {
        if index >= self.0.len() {
            return Err(PyIndexError::new_err("Index out of bounds"));
        }
        Ok(self.0[index])
    }

While this allows developers to fetch values from the array, we need another function for updating a value by its index. Intuitively, there is a __setitem__ function for this purpose:

// Inside impl Array
fn __setitem__(&mut self, index: usize, val: f64) -> PyResult<()> {
        if index >= self.0.len() {
            return Err(
            PyIndexError::new_err("Index out of bounds"));
        }
        self.0[index] = val;         Ok(())
    }

Now all Array values can be accessed and updated! Let’s try it out:

>> from numrs import Array
>> my_array = Array([1])
>> my_array[0]
1.0
>> my_array[0] = 2.0
>> my_array[0]
2.0

Parallelization

Let’s take a minute to improve performance. Rust is very fast with sequential operations, but it is even faster with its fantastic multithreading capabilities. The easiest way to take advantage of multithreading is through therayon library, which can upgrade sequential iterators into parallel iterators. Start by adding the new dependency to Cargo.toml :

[dependencies]
rayon = "1.5.1"

Then add use rayon::prelude::* to your imports in lib.rs . To parallelize any iterator, just change the iter() call to par_iter() . The addition and subtraction dunder methods are good candidates for this optimization. The update will allow element-wise operations to take place across several threads, instead of sequentially on a single thread. When I tested the parallelized addition method with two one million element arrays on my MacBook Pro, I found that the parallel version was 3.6x faster.

Benchmarking

The moment of truth: how does the performance of NumRs arrays compare to native Python lists? Let’s compare both approaches of element-wise array addition with two arrays of one million elements. timeit reports the average runtime of each candidate in seconds:

# benchmark.py
from timeit import timeit
from numrs import Arraylist1 = [num for num in range(1_000_000)]
list2 = [num for num in range(1_000_000)]
arr1 = Array(list1)
arr2 = Array(list2)add_arrays = lambda: arr1 + arr2
add_lists = lambda: [x + y for x, y in zip(list1, list2)]times = {
        "list": timeit(add_lists, number=1000),
        "array": timeit(add_arrays, number=1000),
}print(times)
print(f"numrs is {times['list']/ times['array']} times faster!")

And the results are 🥁🥁🥁🥁🥁🥁🥁🥁

{'list': 35.191460958, 'array': 6.572961165999999}
numrs is 5.3539736610700075 times faster

Wow, our custom library left Python in the dust with a 5x performance improvement!

Conclusion

You now have the tools to build your own Rust-powered Python libraries! In computationally heavy use cases, you’ll be able to drastically improve the performance of your code while providing an intuitive Python interface for other developers. There are many more NumPy features that you could add to NumRs, including multi-dimensional arrays, masking, comparison and linear algebra functions. If you’d like to see any of those features built out, I can address it in a follow on article.