Understanding Memory Management in Rust

Bijesh O S
Geek Culture
Published in
16 min readMar 7, 2023

--

A quick overview of how Rust manages memory and ensures memory safety at runtime.

Image by Gerd Altmann from Pixabay

Ever since Graydon Hoare created Rust as a personal project while working at Mozilla Research in 2006, Rust has come a long way to become one of the most loved programming languages. Most of the programming languages, prior to Rust, were either able to provide good performance by compromising on memory safety aspects or enforce memory safety by taking a hit on the performance metrics. Rust rose in popularity on its ability to combine both performance and memory safety without compromising on either of those aspects.

In this article, let’s try to understand how Rust handles memory and there by provides memory safety.

(Friendly warning: This is a lengthy article with many diagrams and code samples sprinkled across like cheese on top of a pizza. So, brace yourself for a longer reading session. I hope it’s worth your valuable time. To help yourself, feel free to get a cup of coffee; no tea please :-). Okey, if you insist, tea is also fine :-))

Before we go into the details on how Rust handles memory, let’s do a quick refresh on how programs store values in memory.

Stack vs Heap : A refresher

When a program is executed, the runtime data can be stored mainly in two places in the runtime memory : stack and/or heap. Many of the memory related decisions programming languages take depend on the location where the runtime data is stored.

Stack

Stack is a linear data structure that follows ‘last in first out’ principle. Data stored in stack has a fixed size at compile time. There are two operations we can perform on stack: push operation which adds data and pop operation which is used to retrieve the data.

Following diagram is a simplified representation of how data is pushed to stack and how values reside inside the stack. As you can see, push operation always puts data at the top position.

Simplified representation of values stored in stack and push operation

Following diagram is a simplified representation of how data is retrieved from a stack. As shown, the last inserted data comes out first when pop operation is performed.

Simplified representation of values stored in a stack and pop operation

Heap

Now, let’s look into heap. Heap is less organized compared to stack. When data needs to be added to heap, memory allocator finds an empty area that is big enough to store the data. It then marks the identified empty area as being used and returns the pointer (which is an address to that location) to the entity that initiated the allocation. This activity is usually referred to as ‘allocating to heap’ or ‘allocating’. (We’ll be using this terminology throughout this article. Note that, storing to stack is not considered as allocating)

if needed, the pointer to the address can be stored in stack for future reference. When data is required to be retrieved, the pointer needs to be followed and data can be collected from the actual stored location in the heap.

Following diagram is a simplified representation of how data is stored in heap and how the respective pointers are stored in stack.

Simplified representation of values in heap

Performance

Now, let’s look at the performance aspects of each of these data structures.

Pushing data in to stack is faster compared to storing in heap since data is always added to the top. The memory allocator only needs to refer to the top index of stack and push data at that location. On the contrary, allocating to heap is slower since the memory allocator needs to find an unused memory location big enough to store data and do the book keeping activities in order to perform next allocation.

As you can guess, retrieving data from stack is easier since we always take data from the top. But, accessing data from heap is slower since the pointer needs to be followed to identify the location where data is stored in the heap and then retrieve the data.

Stack is also used in relation with function calls. When a function is called, the values passed to the function are stored in stack along with function’s local variables. When function completes execution, those values are popped off the stack.

How does a programming language decide where to store a given value? The location of storage heavily depends on the type of data that is being stored. So, let’s take a quick look at different data types in Rust.

Primitive data types in Rust

The built-in data types in Rust can primarily be categorised into two: scalar and compound types.

Scalar Types

Scalar types represents a single value. There are four primary scalar types available in Rust:

Integer:

  • Integer has two variants, signed and unsigned, catering to multiple bit sizes such as 8, 16, 32, 64 & 128 bits and arch type whose size depends on the architecture of the computer the program runs on.
  • Signed integer types are denoted as : i8, i16, i32 , i64 , i128 and isize
  • Unsigned integer types available are: u8, u16, u32 , u64 , u128 and usize

Floating-point:

  • Signed floating-points are available in two bit sizes : 32 and 64 bit. They are denoted as f32 and f64

Boolean:

  • Similar to other programming languages, this type can store true or false values. It is denoted as bool.

Character:

  • Character is 4 bytes in size and can store Unicode Scalar values. It is denoted as char.

Compound Types

Compound types can be used to hold multiple values together. There are two compound types available:

  • Tuple which is a grouping of number of values. The individual values in this grouping could be of different types, but, this group as a whole is considered as one type.
  • Array which is a collection of multiple values of the same type.

Both tuple and array have fixed length. Once declared, the size cannot be changed.

Types from Standard library

In addition to the above built-in primitive data types, Rust also offers types such as String, Vector and HashMap as part of the standard library.

Ownership

Now, let’s take our first baby step in understanding how memory is managed in Rust.

Prior to Rust, there were two main memory management approaches widely used in the programming languages world.

Some of the languages, such as Java, Go (and many others), used garbage collector to keep looking for unused memory and frees it at certain intervals . Some other languages, such as C/C++, expect the programmer to initiate memory allocation and freeing up as required. While the first approach has an impact on the performance, the second approach negatively impacts memory safety.

Rust uses a different approach. It uses a mechanism called “ownership” in alignment with a set of rules that are checked at compilation time. If these ownership rules are violated, the program won’t compile.

This “ownership” approach enables Rust to make memory safety guarantees without needing a garbage collector. These ownership rules don’t have a run time impact on performance either. Thus, Rust allows memory safely without sacrificing performance. (Later in the article, we’ll cover how ownership works and how the rules are enforced.)

What are the problems ownership mechanism trying to solve? Its main purpose is to manage data in the heap. This mechanism keeps track of what part of the code uses what data on heap. It tries to minimize both the amount of data and unused data on heap to avoid space issues.

This is done by adhering to the following three main ownership rules:

  • Rule 1: Each value in Rust has an owner.
  • Rule 2: There can be only one owner at a time.
  • Rule 3: When the owner goes out of scope, the value will be dropped.

Let’s analyse various ownership scenarios by using String, which is a complex type.

(You may ask, why don’t we start with a simple use case by using one of the primitive data types such as integer? Hmm… let me think. Well, what’s the fun in doing that ? :-). Just kidding. We’ll cover that scenario once we are done with the String type)

Declaration & Scope

When primitive types are used, it’s size is known at the compilation time. So, as per the guidelines we discussed at the beginning, they are allocated to the stack area. When a type similar to String (or Vector, HashMap etc.) is used, since their size may not be known at the compile type, they needs to allocated to heap.

With respect to memory management, this involves two steps:

  • (a) allocation to heap when the value need to be used and
  • (b) cleaning up from heap once the usage is no longer needed.

The first step, allocation of memory, is initiated when the variable declaration is done.

Let’s look at following example, in which a string variable s with the value “Rust” is being declared.

Example 1

When the variable s is declared, memory allocation happens behind the scenes. (We’ll cover the behind the scenes aspect a bit later in this article)

The second step, that involves cleaning up the memory, is often very tricky. In languages with garbage collector (GC) in place, GC takes responsibility of deallocation. In languages without GC, programmer is responsible for allocation and deallocation. This is often tricky and buggy. As we discussed above, Rust takes a different approach. Memory is automatically returned when the variable that owns the memory goes out of scope.

When does a variable go out of scope? In the above example, the variable goes out of scope after the execution of main() function. Is that the case for all variables? Not really. Let’s dig a bit deeper on scope aspects by considering another example.

Let’s take a look at the below example which is a slightly modified version of the previous example.

Example 2

In the above example, as you can see, we’ve added additional opening/closing curly brackets at lines 3 and 5. In this case, variable s comes into scope at the beginning of the block and goes out of scope after that block is over.

When a value goes out of scope, Rust calls a function called “drop”. This function is called implicitly when the value goes out of scope. When execution of the function “drop” completes, the value gets deallocated. Note that, “drop” cannot be called explicitly.

In cases of programmer defined custom types, the author can implement “drop” function and add necessary code to define how memory can be returned.

Assignment and Memory allocation

Now, let’s go a bit deeper on the behind the scenes of what happens when a variable is declared and how the memory allocation happens.

Let’s start analysing the following example, where a variable is declared and its value is assigned to another variable.

Example 3

Let’s look at following string initialization part:

let s1 = String::from(“Rust”);

Since String is a complex type, where size may grow as needed, it cannot be stored in the stack; instead, it needs to be stored in heap. How is the String value stored in heap? It happens in two parts: (a) the actual value of String is stored at a location in heap; but, (b) the details of that location is stored in stack and is associated with the variable. The details associated with the variable has three parts: a pointer, length and capacity; this group of details is stored in the stack. The pointer points to the memory in heap where actual value/bytes are stored.

Following is a simplified representation of how this happens.

s1 holding value of a String

( Note that, in Rust, values of String are stored as bytes, not characters as depicted above. Above depiction is done for simplicity purpose)

Now, let’s look at what happens when the below line gets executed.

let s2 = s1;

When s1 is assigned to s2, the pointer data (along with length and capacity) is assigned to s2. The actual value that is stored in the heap does not change or move. We may expect this operation to result something like the below diagram (as in the case of other programming languages). Well, sorry to disappoint you, this does not happen in Rust! Interesting, isn’t it ?

No, this does not happen in Rust!

Why doesn’t the above scenario happen? If you recollect the ownership rules, it states that there can only be one owner for a value. So, in this case, Rust makes s2 the owner of the value and s1 is considered as no longer valid after the assignment.

What would have happened if Rust had considered both s1 and s2 as owners of the value? As we discussed earlier, when a variable goes out of scope, Rust calls “drop” method to deallocate memory. If both s1 and s2 are the owners of the value, when each of them goes out of scope, drop needs to be called each time. That means, in this case, “drop” would be called two times. When the second “drop” call is made, that would try to deallocate a memory area that was already unallocated by the first “drop” call. This action would have resulted in a memory bug situation which is often called as double free error. Since Rust allows only one owner at any moment, this situation does not arise.

Instead, when s1’s value is assigned to s2, Rust copies s1’s pointer details to s2 and invalidates s1. This action is called “move”. This “move” action is an inexpensive operation and is the default behaviour when complex types are reassigned.

The whole step can be represented as the following diagram:

Ownership move

Because of this behaviour, the line that follows assignment of s1 to s2 won’t compile since s1 is no longer valid at that point. How do we handle the scenario where the situation demands value of s1 to be assigned to s2? Well, that’s where cloning comes to rescue. (I’m glad that you asked. Otherwise, how would I introduce the next section ?:-))

Cloning

In situations where we prefer to create a copy of the string value, we need to use a method called “clone”.

Following example shows how this can be done.

Example 4

As you can see, instead of assigning s1 to s2, a method named clone is called on s1. When this is done, a copy of both the pointer details in stack and values in heap are created and they are associated with s2, as shown below.

Clone

Using Scalar/Compound Data Types

(As you may recall, when we started analysing the memory allocation aspects, we began with a complex case and I had promised that we’ll look into the simple case later. Well, time has come to honour that promise. :-))

When we use scalar and compound data types, where size of the data is known at the compile time, values related to that is usually stored on stack. These data types can be considered as stack-only data types. What are those data types ? They are integers, floats, boolean, char, arrays, slices, tuples, function type etc. In their cases, the “move” operation does not happen since there is nothing in the heap to move.

Let’s look at the following example.

Example 5

When an assignment similar to previous String example happen, related stack data is copied which has all the relevant values. So, n1 remains valid even after the assignment. While using these types, when the variable goes out of scope, it’s value is popped out of stack. There is no need to call “drop”.

The following is a simplified representation of how the data in stack looks like after the assignment happens.

Scalar Types : Stack only

Similarly, the same behaviour happens when we use arrays as well.

Example 6

So far, we discussed how ownership is handled when various operations are performed within a function. What happens when multiple functions are involved and variables are being passed around? How does Rust handle those situations? Let’s find out in the next section.

(If your coffee cup is empty, now is a good time to refill it :-))

Functions and Ownership

Similar to the scenarios related to assignment, passing a variable to a function will either move or copy the ownership aspects.

Let’s look at the following code sample.

Example 7

In the above example, when function do_something() is called, (which was named after a lot of thought :-)), by passing variable s, s’s ownership also moves into do_something() function. So, after that call, s is longer valid in main() function. But, when variable n is passed to function do_something_again(), since it is a stack-only value, n is still valid after that call.

When execution of function do_something() starts, it takes ownership of the value that is being passed. When the function’s execution completes, x goes out of scope there by triggering “drop” call. Once “drop” is called, it takes care of cleaning up the heap memory.

Similar to when the ownership is passed on when a value is passed to a function, returning values from a function also transfers ownership.

Let’s try to understand this more by referring to the below code.

Example 8

In the above code, though x was declared inside the function do_something(), since it returns the value, s1 in the main() function gets its ownership. In case of function do_something_else(), though it takes ownership of the value via y at the beginning of execution, when the value is returned, the ownership is also returned.

In summary, when a value is assigned to a variable or passed over to another function, the ownership also moves along with it. When a variable that has data on heap goes out of scope, “drop” cleans up the value in heap, unless ownership of the value has been moved to another variable.

This approach of passing around ownership looks verbose and tedious and could go out of hand at times, isn’t it? You are not alone in thinking so. Don’t worry, there is another saviour we have not spoken about yet. That saviour’s name is reference. Let’s understand about it a bit more.

References and Borrowing

Reference, similar to a pointer, is an address that can be followed to access the data stored at that address. The difference between a reference and a pointer is that, references are guaranteed to point to a valid value of a particular type for the life of that reference. The data a reference points to may be owned by another variable. So, when references are used, there won’t be any changes in ownership. The action of creating a reference is called borrowing. One point to note is, references are immutable by default; that means, by default, we cannot update the value using a reference.

The following example shows how to use a reference and pass related data to another function while retaining the ownership (To create a reference, we need to prefix & to a variable name). But, since we are trying to attempt to modify related data using an immutable reference, this program won’t compile.

Example 9

Following diagram is a representation of how reference can be used to follow the related value.

Simplified representation of reference

If we need to update the values using a reference, we need to use another variant of reference called mutable reference.

The following example shows how mutable reference can be used.

Example 10

As you can see, the variable that holds the data needs to be declared as mutable and instead of &, we need to use &mut while creating the reference.

When using references, one point to note is that, there can only be one mutable reference valid at any point of time. If we try to create another mutable reference when the earlier created one is still valid, compiler would throw an error. This restriction allows Rust to allow updates in a controlled way and prevent data race conditions at compile time. Similarly, we cannot create a mutable reference at the same time when an immutable reference is valid. But, multiple immutable references are allowed.

Confusing? Let me summarize: at any point of time, there can be only be one mutable valid reference or one or more immutable valid references.

Following example shows these valid scenarios.

Example 11

Since the scopes do not overlap, this code works fine. Due to the above mentioned rules on mutable vs immutable references, Rust guarantees at compile time that it does not have a dangling references situation where there exists a reference which points to nowhere. Since these kinds of issues prevent at compile time, these rules help to avoid such issues from appearing in production.

So far we discussed about reference to a single value. How do we handle reference to a multiple elements in a collection (e.g. array or string )? In such cases, slices can be used to refer a contiguous sequence of elements without taking ownership.

Finally, it is time to conclude.

Conclusion

In this article, we analysed how Rust handles memory and provides memory safety at run time.

Thank you so much for reading this far. I hope it was worth your time. Till we meet next time, happy coding!

References

This article is prepared with heavy references from the official “Rust Book” and other official documentations mentioned below. I highly encourage you to refer to the below links to know more about the memory management in Rust.

--

--