New: We’ve Started Using Rust in Qiskit for Better Performance
By Matthew Treinish, Senior Software Engineer at IBM Quantum
We recently started using the Rust programming language in addition to Python in Qiskit, which has led to significant performance improvements. While Python is still being used as the primary programming language for the majority of Qiskit, certain performance-critical sections of Qiskit are now implemented in Rust. This blog post will explain the details of this recent change and what it means for the Qiskit project as a whole:
•Why Rust?
•Impact to Users
•Performance Benefits
○StochasticSwap
○DenseLayout
○Quantum Info Expectation Value Calculation
•How it works
•Looking forward
Why Rust?
We sometimes need to use a compiled language to get faster runtime performance for Qiskit’s performance-critical sections. We typically can do this by using an external Python library that makes the separation of programming languages easier to manage. For example, we use numpy and scipy a great deal for accelerating linear algebra in Qiskit. Also, retworkx was created for Qiskit to accelerate graph algorithms — You can refer to my earlier retworkx blog post for an overview. However, not every algorithm used in Qiskit is well-suited for a standalone library, and some are tightly integrated with the Qiskit code. In these cases, we need to write some routines in a compiled language and package that is part of Qiskit directly (aka, a Python extension module or a compiled extension).
Previously, we were doing this in Qiskit using Cython. Cython allows you to write code in a Python-like syntax, and that Cython code will then generate C or C++ source files which can then be compiled as part of the normal Python package-build process. Cython is a great tool that allows you to easily mix and combine Python and C/C++ code. However, we started to face limitations in situations where we needed to take more control over the compiled code. For example, when we wanted to leverage multithreading, we hit several issues both around implementation and portability/platform support. While this is theoretically manageable in Cython, because of the difficulty around maintaining it, we decided it would just be easier and simpler to manage this directly in a compiled language and build an interface to Python on top of that.
This is where using Rust comes in. For those who aren’t familiar with Rust, the online book “The Rust Programming Language” contains a good summary:
Rust is a programming language that’s focused on safety, speed, and concurrency. Its design lets you create programs that have the performance and control of a low-level language, but with the powerful abstractions of a high-level language. These properties make Rust suitable for programmers who have experience in languages like C and are looking for a safer alternative, as well as those from languages like Python who are looking for ways to write code that performs better without sacrificing expressiveness.
By using Rust, we can write fast code and leverage all the advantages that the Rust programming language provides. While there are frameworks available to do a similar thing in other languages (for example pybind11 for C++, which is how Qiskit Aer is built), using Rust provides two key advantages: first, the integrated packaging and build system, and second, the memory safety built into the language.
The Rust packaging and build system centers around cargo
, which wraps the Rust compiler and dependency management into a single tool. For integration with Python this makes things exceedingly easy as we don't have to worry about local environment differences and different compilers, dependency management, or any of the other complexity that normally comes with building a compiled library. The source code for the new internal Rust library that is now part of Qiskit lives in a self contained src/
directory when we build and install the Rust library as part of the normal Python install process. During a package build, Python calls cargo
to build our Rust code and everything is handled for us. The Rust library gets compiled into a dynamic library file (with all the Rust dependencies statically linked) and installed as the qiskit._accelerate
module in the Python package tree. We then have access to the Python API we expose from Rust directly from that module.
The second aspect, memory safety, is one of the key advantages of the Rust programming language. The Rust compiler performs compile-time checking to ensure that the code doesn’t contain memory safety bugs such as null pointers, dangling pointers, or data races without impacting runtime performance, because the checks happen at compile time. While this is generally good behavior, this is especially useful in our case, because part of the reason we’re using Rust now is to write multithreaded functions. One of the design tenants of the Rust programming language is fearless concurrency, the idea that you can write parallel code without having to worry about thread safety as the compiler will check that for you and error if you do something unsound. For Qiskit’s use case, Rust is a great fit because it means we can write multithreaded functions and not have to worry about an entire class of bugs at runtime.
Impact to Users
For the most part, the switch to using Rust in Qiskit instead of Cython will not impact the vast majority of users. If you are just installing a released version of Qiskit you won’t notice much; the packages we publish for Qiskit on every release are precompiled and nothing will change. You can continue to use pip to install the latest version of Qiskit and everything should work as it has in the past.
However, if you’re building Qiskit from source for any reason, there are new requirements. For the most part you will just need to have a Rust compiler installed. The rustup installer makes this simple for most use cases. Once you have the rust compiler installed, the normal pip-based workflow should be the same. While this is a new requirement, it does offer some additional options if you are building packages from source. For example, if you wanted to tune your package for maximum performance on your local hardware you can run:
RUSTFLAGS="-Ctarget-cpu=native pip install --no-binary qiskit-terra qiskit-terra
which will build the Rust extension from source with native CPU tuning.
Performance Benefits
StochasticSwap
The StochasticSwap
transpiler pass was the primary reason we started looking at Rust. StochasticSwap
is the default routing pass for Qiskit's compiler at optimization levels 0, 1, and 2, which means it's what gets used most of the time by the transpile()
function. So the performance of this pass is critical, especially because as the size of the quantum circuit being compiled and the size of the target quantum computer increases, the routing algorithm gets increasingly slow to run. When looking at ways to improve this, the simplest way was to parallelize the algorithm. At its core, the algorithm runs a number of random trials trying to find the best swap mapping for each layer in the circuit. Previously these trials were run serially via a Cython module (which was added in Qiskit/qiskit-terra#1789). However, we could run the trials in parallel and the algorithm would work the same as before, but just execute more quickly. This was previously attempted a number of different ways but using Rust to accomplish the task was the best fit and offered the best performance. The migration of the StochasticSwap
pass was done in Qiskit/qiskit-terra#7658.
To benchmark how the Rust implementation improves performance I ran the following script with both the new parallel Rust version and the previous serial Cython version:
Then I graphed the ratio of the runtime performance between the two runs:
In the above plot we’re showing the ratio of Rust runtime to Cython runtime to run StochasticSwap
on a Quantum Volume circuit of a given size targeting a device with a heavy hexagon coupling map and different numbers of qubits. A value of 1 there (which is shown as white) means there is no difference in performance. Values below 1 (which are indicated by bluer parts on the plot) show the new Rust version performs better while values above 1 (which are indicated by more red on the plot) show where the old Cython version was faster.
The best performance for the Rust implementation is ~7.5x faster than the serial Cython (for larger circuits on larger devices).
DenseLayout
In optimization levels 1 and 2 the DenseLayout
transpiler pass is the default method used to find an intial layout (the mapping of virtual qubits in the circuit to physical qubits on the backend). When looking at the runtime performance of compiling >1000 qubit circuits after reimplementing StochasticSwap
in Rust, the runtime of StochasticSwap
and DenseLayout
was roughly equivalent. This was problematic because the algorithm in DenseLayout
is quite simple and it shouldn't be as slow as StochasticSwap
. Previously, we didn't really notice the poor scaling in the DenseLayout
pass because relative to the previous implementation of StochasticSwap
it was still fast. The DenseLayout
pass was originally written in Python using scipy sparse matrices and it was relatively simple to rewrite the core of the algorithm to be multithreaded in Rust. This was implemented in Qiskit/qiskit-terra#7740
After rewriting the pass using Rust we saw up to a 3 order-of-magnitude speedup in the best case:
In this graph, the Y axis is the speedup factor of the Rust implementation, so if the number is 100.0 then the new Rust version is 100x faster than the previous implementation. The red line is set at a ratio of 1. If the blue line is below the red then the old version is faster, and if above the red line the new Rust version is faster.
This graph was generated by running the following script with both the new Rust version and the original version and then graphing the ratio of Old version time / Rust version time
for each data point.
Quantum Info Expectation Value Calculation
The second place we’re leveraging Rust right now is internally as part of the Statevector.expectation_value()
and DensityMatrix.expectation_value()
methods. This was originally written in Cython to accelerate these functions. While this isn't quite as critical a routine to Qiskit's operation as StochasticSwap
, we decided to rewrite it in Rust since it was the only use of Cython left after migrating. This was done in Qiskit/qiskit-terra#7702. After the migration was complete, we did some quick benchmarking to show the general scaling characteristics of the new Rust implementation compared to the previous Cython implementation:
In this graph the Y axis is the speedup factor of Rust, so if the number is 5.0 the new Rust version is 5x faster than the old Cython version. The red line is at ratio of 1.0 meaning that Rust is the same speed as Cython. If the blue line is below the red the old Cython version is faster and above the red line the new Rust version is faster.
This graph was generated by running the following script with both the Rust version and the Cython versions and then graphing the ratio of Cython time / Rust time
for each data point.
It’s also worth noting that the multithreading for this function only starts at 19 qubits. Below 19 qubits the Rust implementation is single threaded just like the Cython version.
Note: All the benchmarks were run on an AMD Ryzen Threadripper 3970X 32-Core Processor with Python 3.10 on Linux. The performance will likely be different on your local system especially when multiple threads are being used.
How it works
While the current use of Rust in Qiskit is relatively minimal, the way we’ve constructed the integration is easy enough to extend over time. For functions that we need to accelerate with Rust, we create standalone Rust functions and data structures to implement the necessary functionality. Then we leverage the PyO3 library to build our interface to Python. PyO3 makes building a Python interface to Rust code quite simple. It provides Rust macros that let you write vanilla Rust code, which will automatically generate a C foreign function interface (FFI), providing a Python C API that enables the Python interpreter to call your Rust function. For example, if you wanted to write a Rust function that would take in an integer from Python and return 2 times that integer, you could do something like:
The #[pyfunction]
macro there will automatically generate a C FFI for Python to use at compile time. This generated C FFI handles the conversion between Python and Rust types, and all the boilerplate code for interacting with Python. This lets us abstract away most of the details for interacting with Python and concentrate on writing Rust code for the functionality we need. There are similar macros available for creating Python classes and modules too.
This model lets us keep the boundary between Rust code and Python code fairly clear and also gives us the flexibility to leverage the strengths of both languages fairly easily. When we need to leverage Rust functionality we can implement it natively in Rust without having to manually deal with maintaining the separation between Rust and Python (unless we need to).
For the packaging and build system integration with Python, we leverage the setuptools-rust
library to integrate calling cargo
to build the Rust code and manage installing the compiled binary dynamic library file to the correct place in the Qiskit package tree for Python to be able to load it. This means that most people don't even need to think about the Rust code. As long as they have the Rust compiler installed, the normal Python packaging takes care of everything automatically. This is ideal because Qiskit is still primarily a Python library and most developers (and users) don't want to have to deal with the extra complexity of having an additional programming language.
Looking forward
While right now our use of Rust directly inside Qiskit is fairly minimal, looking to the future we potentially can start to use Rust in more places. With Rust now integrated into Qiskit development we likely will start looking at using it to accelerate more functionality in Qiskit. The way we’ve integrated Rust into Qiskit makes it simple to expand its use if we need to. So in places where we have performance bottlenecks or scaling issues we can potentially start to take advantage of using Rust to try and address those places.
Are you a Rust developer? We welcome contributions from the community to qiskit-terra and retworkx