Python is a great programming language but sometimes it can be a bit of slowcoach when it comes to performing certain tasks. That’s why developers have been building C/C++ extensions and integrating them with Python to speed up the performance. However, writing these extensions is a bit difficult because these low-level languages are not type-safe, so doesn’t guarantee a defined behavior. This tends to introduce bugs with respect to memory management. Rust ensures memory safety and hence can easily prevent these kinds of bugs.
Slow Python Scenario:
One of the many cases when Python is slow is building out large strings. In Python, the string object is immutable. Each time a string is assigned to a variable, a new object is created in memory to represent the new value. This contrasts with languages like Perl where a string variable can be modified in place. That’s why the common operation of constructing a long string out of several short segments is not very efficient in Python. Each time you append to the end of a string, the Python interpreter must allocate a new string object and copy the contents of both the existing string and the appended string into it. As the string under manipulation become large, this process can become increasingly slow.
Problem: Write a function which accepts a positive integer as argument and returns a string concatenating a series of integers from zero to that integer.
So let’s try solving the above problem in python and see if we can improve the performance by extending it via Rust.
Method I: Naive appending
This is the most obvious approach. Using the concatenate operator (+=) to append each segment to the string.
Method II: Build a list of strings and then join them
This approach is commonly suggested as a very pythonic way to do string concatenation. First a list is built containing each of the component strings, then in a single join operation a string is constructed containing all of the list elements appended together.
Method III: List comprehensions
This version is extremely compact and is also pretty understandable. Create a list of numbers using a list comprehension and then join them all together. This is just an abbreviated version of last approach and it consumes pretty much the same amount of memory.
Let’s measure the performance of each of these three approaches and see which one wins. We are going to do this using pytest-benchmark module.
Here is the result of the above benchmarks. Lower the value, better is the approach.
Just by looking at the Mean column, one can easily justify that the list comprehension approach is definitely the winner among three approaches.
After trying out basic implementation of the above problem in Rust, and doing some rough benchmarking using cargo-bench, the result definitely looked promising. Hence, I decided to port the rust implementation as shared library using rust-cpython project and call it from python program.
To achieve this, I had create a rust crate with the following src/lib.rs.
Building the above crate created a .dylib file which needs to be rename .so.
Then, we ran the same benchmark including the rust one as before.
This time the result is more interesting.
The rust extension is definitely the winner. As you increase the number of iterations to even more, the result is even more promising.
Eg. for iterations = 1000, following are the benchmark results
You can find the code used in the post:
I am very new to Rust but these results definitely inspires me to learn Rust more. If you know better implementation of above problem in Rust, do let me know.
Distributing of your python module will demand the rust extension to be compiled on the target system because of the variation of architecture. Milksnake is a extension of python-setuptools that allows you to distribute dynamic linked libraries in Python wheels in the most portable way imaginable.