🦀Assembly code generated from Rust for parameter passing
Understand the assembly code generated for passing a parameter by value, reference, Box, Rc, and Arc
Here we will be exploring the performance implications of the passing the self
parameter by value, reference, and smart pointers (Box
, Rc
and Arc
). The generated assembly code will help us understand what happens under the hood.
We will be working with the Complex
struct defined below. The code shows how a struct and its associated methods declarations in Rust. Note that like Python, the self
parameter that refers to the associated object is passed explicitly in the method declaration.
Now let’s examine the assembly code generated for each method shown above.
Self is passed by value to the method
By default, Rust assumes that a parameter passed by value is moved. The ownership of the parameter passes to the called function. In this example, however, the Complex
type implements Copy
and Clone
traits. This means that the called method will get a copy of the passed parameter.
pub fn magnitude_self_copy(self) -> f64 {
(self.real.powf(2.0) + self.imaginary.powf(2.0)).sqrt()
}
The assembly code generated for the above function is shown below. One interesting thing to note here is that the compiler has really optimized the passing of the Complex
object by storing the real
and imaginary
fields in xmm0
and xmm1
registers, respectively. The method computes the result, and the final return value is returned via the xmm0
register.
The code generated for calculating the magnitude is annotated in the assembly code below.
Self-reference is passed to the method
The method is immutably borrowing the object. The method cannot modify the object.
pub fn magnitude_self_reference(&self) -> f64 {
(self.real.powf(2.0) + self.imaginary.powf(2.0)).sqrt()
}
A reference to self (&self
) has been passed in the above function.
The generated code looks like the self
case covered earlier. The main difference is that the compiler now passes the pointer to the Complex
object. The pointer is passed via the rdi
register. As a result of this difference, the first two lines of assembly populate the xmm0
and xmm1
registers with the real
and imaginary
fields from the struct. The rest of the assembly code is identical to the self
case.
Self points to the object on the heap via Box
Here the object is allocated on the heap. The method gets complete ownership of the object and will cease to exist after the method returns. The memory will be released back to the heap.
pub fn magnitude_self_box(self: Box<Self>) -> f64 {
(self.real.powf(2.0) + self.imaginary.powf(2.0)).sqrt()
}
A Box
smart pointer to self is being passed here. The Box
contains a pointer to the Complex
object stored on the heap.
The generated assembly code resembles the &self
case. The xmm0
and xmm1
registers are populated from the heap. The major difference here is that the heap memory will be freed at the end of the method call. This happens because the method owns the Box
that points to the Complex
on the heap. Once the method exits, the self Box
will go out of scope. The Box
smart pointer will then free the associated memory (The Box
in Rust is like the unique_ptr
in C++).
The assembly code below has been annotated to show the magnitude computation and release of the heap memory.
A reference-counted smart pointer Rc to self
Here a shared smart pointer has been passed to the method. Multiple pointers to this object may be active in the same thread. The method will share ownership to self
. The function will decrement a shared reference counts stored along with the Complex
object. If this were the only reference to the object, the object would be destroyed, and the memory would be released to the heap. If the reference counts do not go to zero, the object will live even after the method returns.
pub fn magnitude_self_rc(self: Rc<Self>) -> f64 {
(self.real.powf(2.0) + self.imaginary.powf(2.0)).sqrt()
}
The above method is designed to take ownership of Rc
, a reference counting smart pointer. The Rc
points to the following data on the heap:
When an Rc
is created it starts with the strong reference count set to 1. If an Rc
is cloned, it does not copy the pointed data, it just increments the reference count. This way multiple shared references may point to the same heap memory. Also, when an Rc
is dropped, the reference count is decremented. If the reference count falls to 0, the memory block on the heap is de-allocated.
The generated code starts with the xmm0
and xmm1
registers getting populated with the real and imaginary parts from the struct. Notice that the offsets for the access are 16 and 24, respectively. This is due to the two 64-bit reference counts that are present before the Complex
object. Once the values have been saved, the reference counts are decremented in preparation for the method going out of scope. If the reference count hits zero, the object pointed from the Rc
will be deleted. If the reference is nonzero, the memory block containing the reference counts and Complex
objects live as there are other Rc
smart pointers pointing to the same memory block.
Note: We have ignored the weak reference in this discussion.
An atomic reference-counted Arc to self
Arc
is a smart pointer that operates across threads. This requires that reference count increments and decrements be atomic. An atomic read-modify-write operation is performed to manage reference counts across threads.
pub fn magnitude_self_arc(self: Arc<Self>) -> f64 {
(self.real.powf(2.0) + self.imaginary.powf(2.0)).sqrt()
}
Here a multi-thread safe Arc
smart pointer is being passed to the method. The method will now own the Arc
smart pointer. When the method goes out of scope, the shared reference counts saved along with Complex
will be atomically decremented. If the reference counts reach zero, the object on the heap will be deleted. Note that the reference counts are now decremented using atomic read-modify-write operations.
The Arc
smart-pointer points to a heap allocation that contains AtomicUsize
strong and weak references. The Complex
is stored after the two references (see the following table for the memory representation).
The code generated for Arc
is like the code generated for Rc
. The significant differences from the Rc
assembly code are:
lock sub qword ptr [rdi], 1
is generated for handling the atomic decrement of the reference count.- The drop check and weak reference count decrement are handled in
alloc::sync::Arc<T>::drop_slow
function.
Learn more
Visit the Assembly code generated from Rust for parameter passing article on EventHelix.com to learn more. Examine the Rust to assembly mapping presented here in the Compiler Explorer.