Rust zero-cost abstraction in action

Ibrahim Dursun
Feb 7 · 2 min read
Photo by Volkan Olmez on Unsplash

One of my colleagues was experimenting with Rust. He started by writing a sudoku solver which he has already written in C before. Once he was completed writing it in Rust, he was very disappointed because Rust version was twice as fast than the C version which was hand-optimised by pulling off all the tricks he knew to make it perform well. He eventually managed to make the C version as fast as the Rust version by removing the intrinsics.

This piqued my interest in looking into what kind of assembly code is being generated in the final binary.

Sum numbers up to N

The following code is summing up numbers from 1 to n.

Generated assembly for this is like this:

I don’t know anything about reading x86 opcodes other the basics but I can confidently tell that I don’t see any loops here.

After reading a little bit about the opcodes and what various registers do, I was able to make sense of the generated code.

RDI register is where our argument n is stored. RAX register is where the return result is stored. So if we replace RAX with RETURN and RDI with N then the generated assembly roughly looks like this:

which essentially calculates the formula: (N-2)*(N-3)/2 + 2*N - 3 which can be simplified to N*(N-1)/2. That's the formula used for summing numbers between 1 to N!

That’s surprising that the compiler was smart enough to recognise this and replaced it with the actual formula.

Zero cost abstractions

One of the promises of Rust is zero-cost abstractions. To bluntly put it, your flavour of code doesn’t affect the performance. It doesn’t matter if you use loops or closures, they all compile down to the same assembly.

So if I rewrite the same code by using fold, then the generated assembly should be roughly the same.

generated the following:

It’s the same assembly!

How about using sum instead?

Same result.

I am really impressed with what I have seen so far, but the following just blew my mind off.

compiled down to the following:

Doesn’t even call the functions. It’s 30.

Conclusion

I knew about const folding that compilers perform to generate fast code but I didn’t know it could go this far.

Rust is known to be very performant but the best thing about writing Rust code that you never really care about the performance while writing your code. It just happens to be fast!.


Ingeniously Simple

How Redgate build ingeniously simple products, from…

Ibrahim Dursun

Written by

Lead Software Engineer at Redgate

Ingeniously Simple

How Redgate build ingeniously simple products, from inception to delivery.

More From Medium

More from Ingeniously Simple

More from Ingeniously Simple

More from Ingeniously Simple

Entities, components and systems

More from Ingeniously Simple

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade