How Fast Is Rust?

Comparing a Sieve of Eratosthenes in Rust vs. Go.

Alistair Israel
4 min readMay 28, 2020

After seeing 8F3E’s article, How Fast Is Golang? which compares the performance of Python vs. Go using a Sieve of Eratosthenes as a microbenchmark, I thought it might be fun to see how Rust stacks up.

So, as an exercise to flex my Rust muscles before taking on a much harder challenge, I decided to fork 8F3E’s repo into my own: https://github.com/aisrael/sieve-of-eratosthenes

I then ported the Go code to Rust code, almost verbatim:

    let mut primes: Vec<i32> = Vec::new();    for i in 2..=n {
primes.push(i);
}
{
let mut i = 0;
while i < primes.len() {
let factor = primes[i];
sieve(&mut primes, factor);
i += 1;
}
}
for i in 0..primes.len() {
println!("{}", primes[i])
}

And

fn sieve(primes: &mut Vec<i32>, factor: i32) {
let mut i = 0;
while i < primes.len() {
let value = primes[i];
if value != factor {
if value % factor == 0 {
primes.remove(i);
}
}
i += 1;
}
}

In most cases I had to use a while loop rather than a for loop, because in Rust, for loops take one of two forms:

for i in 0..list.len() {
if primes[i] != 0 {
let factor = primes[i];
sieve(&mut primes, factor);
}
}

That won’t work if you mutate the vector and change its length (since the list.len() is evaluated only once, at the beginning of the loop). It will compile, but give us a run time error.

We might’ve also tried:

for factor in primes {
sieve(&mut primes, factor);
}

But this won’t even compile!

The reason is all about why Rust is considered a language that helps you write ‘safer’ and more robust code. In this particular case, Rust sees that we’re mutating the primes vector but we’re also trying to iterate over it which can lead to hard to debug errors later on.

For instance, Rust if you try to do this:

sieve(&mut primes, primes[i]);

Rust complains with:

28 |                 sieve(&mut primes, primes[i]);
| ----- ----------- ^^^^^^ immutable borrow occurs here
| | |
| | mutable borrow occurs here
| mutable borrow later used by call

In this case, we have to introduce another, immutable, variable just to ‘hold’ the value of primes[i] which we can then pass to the sieve() function.

So, while Rust may feel like it’s getting in the way by being strict, the workaround is just to be more explicit, telling Rust that, “Yes, we know what we’re doing.”

EDIT: I rewrote the Rust implementation to more closely match the Go implementation

So, now instead of having to use while, the Rust code looks like:

    for i in 0..primes.len() {
let factor = primes[i];
if factor != 0 {
sieve(&mut primes, factor);
}
}

See the changes in this commit: 567b05a.

I’ve also updated the results section after re-running the benchmarks.

Building

A quick note before just running the microbenchmarks: if we simply use rustc or even cargo using just no arguments, the Rust compiler will compile the resulting binary without optimisations and with debugging information.

This executable will be slower than the executable built with Go, because the Go compiler performs optimisations by default.

Hence, to be fair we need to pass flags to rustc to build the executable as if we had used Cargo and ran cargo build --release. I also took care to name the Rust binary differently from the Go binary:

rustc -C debuginfo=0 -C opt-level=3 sieve.rs -o bin/sieve-rs

Running

I ran and timed the Rust executable using zsh’s built-in time function:

time bin/sieve-rs 10000

Which, on my 2019 16" Macbook Pro gives

bin/sieve-rs 10000  0.01s user 0.00s system 14% cpu 0.075 total

I then wrote up a Python script that could run both the Go and Rust executables, giving them different and progressively increasing parameters, then capturing the user time output for each:

python bench.py

We can even specify minimum and maximum limits, and the benchmark script will try to compute the ‘steps’ in between by using powers of 10. For example, if given:

MIN=100 MAX=10000 python bench.py

Then the benchmark is run for both Go and Rust for 100, 200, …, 800, 900, 1000, 2000, …, 8000, 9000, then 10000 primes.

The bench.py script also overwrites the timing_data.csv in the testing folder. I then modified 8F3E’s original Jupyter notebook script to just use the CSV column headers for the graph labels.

Like 8F3E I didn’t bother running each n multiple times to get the average, since the trend was clear with one pass.

Results

I’ll let the graphs speak for themselves (lower is better):

Logarithmic plot

Final Words

Sure, microbenchmarks are fun and, when judiciously applied and carefully analysed, can provide some useful insight into a very specific choice between, say, one implementation or another.

In the real-world though, with large applications and complex systems, language or compiler level differences are overshadowed by a host of other factors: algorithms, program architecture, I/O, infrastructure, system architecture, and so on.

So please don’t take these results too seriously.

On the other hand, the reason why we use Rust is precisely what I mentioned earlier — while there’s added effort and cognitive load on the developer because we have to worry about ownership and mutability, this in the long run helps us write more performant code (since there’s no overhead for garbage collection) and more importantly, more reliable code.

--

--

Alistair Israel

Tall, dark, and bald. Driver, cook, and Staff engineer.