Memory Management & Rust
I have always been fascinated by the programming languages. I always try out different programming languages and always interested to learn internals by cherry picking its feature; one at a time. The story behind all those design choices is always exciting and something worth to pursue.
Over the years I have learned a lot of programming languages, however, I am not an expert on any of them; the fact is I never intended to be. The only thing that motivates me is the reason behind their design, simply, “why languages are designed the way they are?”. What makes them unique from the other languages? What are the tradeoffs, and what are the benefits of those design choices?
A couple of weeks ago, I tried out Rust. One of its concept, called ownership, made me excited. It is the concept that Rust compiler use under the hood for memory safety that helped to solve null pointer error that most of the other low-level programming languages face.
Rust is a programming language designed with safety in mind. The designers of the language wanted Rust to be safe, low level, efficient, and fast. But the problem was you can only get one, either low-level control or more safety. For a lot of years, programmers had to be careful while dealing with low-level stuff, languages like C and C++ allowed low-level control but developers had to write a lot of code to ensure memory safety and avoid memory leaks. On the other hand, languages like Haskell, Python, and Ruby were safe but did not allow low-level control, so basically it’s mostly a tradeoff between low-level control and safety.
Now, Rust came to existence, it’s basically C++ with newer syntax that guides programmer towards safety still allowing low-level control, how does it do it? The concept like ownership and borrowing (I will write a new blog on this topic) helps it to achieve the memory safety that other low-level language could not incorporate.
Malloc and Garbage Collector
Before taking about ownership, I want to mention concepts like malloc and garbage collector. Language like C and C++ use malloc to allocate memory on a heap, this gives the programmer a lot of low-level memory management control but if the developer is not careful enough then leads to a lot of bugs like double free, dangling pointers, and memory leaks. Let me explain a little bit about these problems, double free is a problem where two sections of the code are pointing to a same block of memory and both of them tries to free the allocated memory. One of the section successfully frees up the memory, but the other one does not have anything to free. We can also explain another problem called dangling pointers with the same example as above when one section frees the memory and other section pointing to the same block does not have anything to point to. Likewise, a memory leak occurs when an allocated memory block is not deallocated even after we don’t plan to use the same block of memory later in the future.
To solve all of the above problems languages like Haskell, Python, and Ruby has something called garbage collector. The garbage collector is a service on a compiler and an interpreter that automatically identifies the memory location that is no longer in use and frees up the memory. This is a nice concept but garbage collector needs time to analyze and clean up memory at run time which might make our program execution time a little bit longer as the program needs to be paused to let garbage collector do its job. As language itself is on the charge of allocating and deallocating memory the low-level memory management is out of the window.
So if Rust is safe and gives a lot of low-level memory control then what is ownership? When a Rust program starts to execute it creates a stack that keeps a track of everything happening on the program. First, the main method is executed which is then added to the stack, in more technical terms the very stack is called call stack. Let us suppose (as many variables as possible are stored inside registers) every function that we encounter is then added to the stack with variable, and values defined on the function, hence creating a scope system (everything defined on a stack can only be accessed on the following stack). Stack can’t hold a large set of data, we need to have a data with definite length (static data type) as the stacks on the computer is limited so we tend to store larger sets of data (dynamic data types) over a heap memory (which is also limited but is much larger in size than stack) and add pointer reference (reference to the heap location) at the stack.
When a function execution is over the respective variable and values stored on the stack is thrown away hence cleaning up the memory resources captured. This is one of the reasons Rust does not need a garbage collector. So far everything looks great, right? Let us take an example, we have a large collection of data on a vector (data structure) that is stored on a heap and has reference to it on a stack. Suppose a function now wants to copy the collection of data stored on a vector (data structure) to a new variable name, is it efficient to copy the whole data set to yet another memory location? Hell, no copying a large collection of data is inefficient and takes up a lot of space in memory. We don’t copy the actual data rather we copy the memory reference (pointer) making code much more efficient, fast and saving a lot of space on memory. Now we have created two variables pointing to the same memory location at the heap leading towards a problem I have mentioned above like double free and dangling pointers.
In this situation, Rust defines ownership to a given set of data structure, at any instance of time only one variable has ownership to a data structure. Let’s try to understand this with a real-world example, suppose you have bought a book, at this instance, you are the owner of the book, you own this book no other person but you. If someone wants to read this book they can’t have a copy of it, if they do it’s a different book. Once you finish reading, you can destroy the book. The same happens to a data structure on Rust, at any given instance of time only one stack frame can own this data structure, this has a nice implication as we can safely free the data structure from heap once the stack frame that owns it is out of the stack. Cool right? no more dangling pointers.
But there is a problem in the above scenario, we are not reusing the book or resource. If you and one of your friend want the same book then instead of destroying the book after reading it, you can give it to your friend.
The same case can happen here as well, like, what if two variables are pointing to the same data structure. For this, Rust has a concept called move. Like in the real-world example when someone wants to read your book he/she asks you for that book you transfer the ownership to him/her now you no longer have access to this book you have already transferred the ownership, now book belongs to your friend. The same thing happens on Rust when two variables are pointing to the same data structure. The ownership is moved to the new variable when this new variable is out of the scope only then the memory is freed hence fixing the issue of double free.
Concluding this is one of the reasons Rust does not need a garbage collector. It uses the stack to keep track of the program. Using ownership Rust gives safety while freeing up memory. Rust frees up the memory when respective variable owning the data is out of the scope after the complete execution of the respective function.
In the next blog post, I will talk about borrowing the concept used by Rust to manage pointer return type from a function. Stay tuned.
Thank you Achyut Pokhrel for reviewing the content before publication.