Rust zero-cost abstraction in action

Ibrahim Dursun
Ingeniously Simple
Published in
2 min readFeb 7, 2020
Photo by Volkan Olmez on Unsplash

One of my colleagues was experimenting with Rust. He started by writing a sudoku solver which he has already written in C before. Once he was completed writing it in Rust, he was very disappointed because Rust version was twice as fast than the C version which was hand-optimised by pulling off all the tricks he knew to make it perform well. He eventually managed to make the C version as fast as the Rust version by removing the intrinsics.

This piqued my interest in looking into what kind of assembly code is being generated in the final binary.

Sum numbers up to N

The following code is summing up numbers from 1 to n.

Generated assembly for this is like this:

I don’t know anything about reading x86 opcodes other the basics but I can confidently tell that I don’t see any loops here.

After reading a little bit about the opcodes and what various registers do, I was able to make sense of the generated code.

RDI register is where our argument n is stored. RAX register is where the return result is stored. So if we replace RAX with RETURN and RDI with N then the generated assembly roughly looks like this:

which essentially calculates the formula: (N-2)*(N-3)/2 + 2*N - 3 which can be simplified to N*(N-1)/2. That's the formula used for summing numbers between 1 to N!

That’s surprising that the compiler was smart enough to recognise this and replaced it with the actual formula.

Zero cost abstractions

One of the promises of Rust is zero-cost abstractions. To bluntly put it, your flavour of code doesn’t affect the performance. It doesn’t matter if you use loops or closures, they all compile down to the same assembly.

So if I rewrite the same code by using fold, then the generated assembly should be roughly the same.

generated the following:

It’s the same assembly!

How about using sum instead?

Same result.

I am really impressed with what I have seen so far, but the following just blew my mind off.

compiled down to the following:

Doesn’t even call the functions. It’s 30.

Conclusion

I knew about const folding that compilers perform to generate fast code but I didn’t know it could go this far.

Rust is known to be very performant but the best thing about writing Rust code that you never really care about the performance while writing your code. It just happens to be fast!.

--

--