Rust from Scratch: Smart Pointers

Mohsen Zainalpour
11 min readApr 28, 2023

--

Rust is a modern systems programming language designed for speed, safety, and concurrency. One of the key features of Rust is its support for smart pointers, which enable flexible and efficient memory management in Rust programs. Smart pointers are a type of data structure that combine the flexibility of pointers with the safety and convenience of higher-level abstractions like references and values.

This post series, “Rust from Scratch: Smart Pointers”, will cover the key smart pointer types in Rust, along with related concepts and best practices for using them effectively. We’ll start with an overview of smart pointers and why they’re important in Rust, and then dive into each of the major smart pointer types in turn, covering topics such as allocation, ownership, thread safety, and more.

By the end of this post series, you’ll have a solid understanding of how smart pointers work in Rust, and be able to use them effectively in your own Rust programs. Whether you’re a seasoned Rust developer or just getting started, this post series has something for everyone who wants to learn more about Rust’s unique approach to memory management. So let’s get started!

Smart Pointers

Smart pointers are a type of data structure in Rust that combine the flexibility of pointers with the safety and convenience of higher-level abstractions like references and values. They provide a way to manage memory in a way that is both efficient and safe, while also enabling features like sharing and mutability.

In Rust, there are several types of smart pointers, each with its own unique features and use cases. In this post series, we will cover the following smart pointer types:

  1. UnsafeCell: A low-level smart pointer that enables unsafe mutation of shared data.
  2. Cell: A non-thread-safe smart pointer that provides interior mutability of a single value.
  3. RefCell: A thread-safe smart pointer that provides interior mutability of a single value.
  4. Rc: A reference-counted smart pointer that allows multiple owners of the same data value.
  5. Arc: An atomic reference-counted smart pointer that provides thread-safe shared ownership of data.
  6. Box: A smart pointer that provides ownership and efficient memory allocation on the heap.
  7. Mutex: A thread-safe smart pointer that provides mutual exclusion and synchronization between threads.
  8. Cow: A smart pointer that provides a flexible way to work with borrowed and owned data.

Each of these smart pointer types is designed to address specific needs and use cases in Rust, and understanding how they work is critical for building efficient and safe Rust programs.

In the next sections, we will dive into each of these smart pointer types in turn, covering their features, implementation details, and best practices for using them effectively.

Common concepts related to smart pointers

While each smart pointer type in Rust has its own unique features and use cases, there are several common concepts that are important to understand when working with smart pointers. In this section, we will cover the following concepts:

  1. Deref and DerefMut: The Deref and DerefMut traits provide a way to treat a smart pointer as if it were a reference to the value it owns, allowing you to access the value’s methods and fields.
  2. Sized: The Sized trait is used to determine the size of a value at compile time, and is required for certain smart pointer types that need to know the size of the value they own.
  3. Sync and Send: The Sync and Send traits are used to enforce thread safety in Rust programs, and are important when working with shared data and concurrent execution.
  4. Drop trait: The Drop trait provides a way to run custom cleanup code when a smart pointer goes out of scope, allowing you to manage resources like memory or file handles.

Understanding these concepts is critical for working effectively with smart pointers in Rust, and can help you avoid common pitfalls and performance issues.

Interior Mutability

Rust’s interior mutability pattern allows us to mutate data while still having immutable references to it. This is done through unsafe code, which we need to manually check that the borrowing rules are followed. The unsafe code is then wrapped in a safe API and the outer type remains immutable. The Cell type in the standard library is the prime example of this pattern.

Cell:

Rust is well-known for its strong guarantees on memory safety, achieved through its ownership and borrowing system. However, sometimes we need interior mutability, which allows for mutation of data even when we have an immutable reference. One way Rust provides this is through the Cell smart pointer. In this post, we'll walk through a custom implementation of Cell, explaining each step to help you understand its fundamentals.

Rust memory safety is based on the rule that, given an object T, only one of the following can be true: having multiple immutable references (&T) to the object (also known as aliasing), or having one mutable reference (&mut T) to the object (also known as mutability). This rule is enforced by the Rust compiler, but there are certain situations where it is not flexible enough, and multiple references to an object are necessary while also mutating it.

The idea of Cell is that when you read its contents you can either set or get the entire contents at once. With single-threaded access, only one operation can occur at a time, ensuring there are no multiple mutable accesses. This provides a safe API for interior mutability.

There are two types of cells: Cell<T> and RefCell<T>. Cell<T>allows you to change the value inside it by taking it out and putting it back in, but it requires that the type Tcan be copied. If you want to use references instead of values, you have to use RefCell<T>.

Shareable mutable containers exist to provide a controlled way of allowing mutability in the presence of aliasing. Both Cell<T> and RefCell<T> enable this kind of behavior in a single-threaded setting. If mutability and aliasing need to be shared among multiple threads, Mutex<T>, RwLock<T>, or atomic types can be used — all of which will be covered in the upcoming posts.

The principle behind Cell is “Never handing out references”, meaning we never hand out any references when mutating the wrapped value

It’s important to note that Cell does not implement the Sync trait, which indicates that a type can be safely shared between threads. This is because the set method of Cell can lead to data races if two threads attempt to modify the same cell at the same time. In other words, you can’t give away a reference to a Cellto a different thread.

Let’s start by examining the MyCell struct, based on the standard Cell type, and its associated methods:

use std::cell::UnsafeCell;

struct MyCell<T> {
value: UnsafeCell<T>,
}

impl<T> MyCell<T> {
pub fn new(value: T) -> MyCell<T> { /* ... */ }

pub fn set(&self, value: T) { /* ... */ }

pub fn get(&self) -> T
where
T: Copy,
{ /* ... */ }
}

The MyCell struct contains a single field, value, which stores the contained value using UnsafeCell. This allows for interior mutability.

UnsafeCell is a type in Rust that provides a raw pointer*mut T to its content. It is up to you as the abstraction designer to use that raw pointer correctly.

Now, let’s look at the implementation of the MyCell methods:

new(): Creates a new MyCell with an initial value.

pub fn new(value: T) -> MyCell<T> {
MyCell {
value: UnsafeCell::new(value),
}
}

set(): Updates the contained value of the MyCell. This method can be called on an immutable reference to the MyCell, enabling interior mutability.

pub fn set(&self, value: T) {
unsafe {
*self.value.get() = value;
}
}

Although wrapping the code that dereferences the raw pointer with an unsafe block gives us the ability to create an exclusive reference to the value and change it, nothing stops us from sharing the shared reference of the MyCell among different threads in which each can set the inside value to a different value. The only way of preventing this to happen is NOT implementing the Sync trait:

impl<T> !Sync for MyCell<T>{}

However, there is no need to explicitly implement !Sync since the compiler can infer that it is not Syncdue to the fact that it relies on UnsafeCell, which is not Sync.

get(): Returns a copy of the contained value. This method requires that the contained type T implements the Copy trait, ensuring that we don't violate Rust's ownership rules when returning the value.

pub fn get(&self) -> T
where
T: Copy,
{
unsafe { *self.value.get() }
}

Here is a final implementation of Cell: (GitHub repository)

use std::cell::UnsafeCell;

struct MyCell<T> {
value: UnsafeCell<T>,
}

impl<T> MyCell<T> {
pub fn new(value: T) -> MyCell<T> {
MyCell {
value: UnsafeCell::new(value),
}
}

pub fn set(&self, value: T) {
unsafe {
*self.value.get() = value;
}
}

pub fn get(&self) -> T
where
T: Copy,
{
unsafe { *self.value.get() }
}
}

Using UnsafeCell and unsafe code blocks can be a powerful way to achieve interior mutability in Rust, but it's important to be careful and use these tools only when necessary, and with a thorough understanding of the potential risks and how to mitigate them.

RefCell:

The RefCell<T> type represents single ownership over the data it holds and supports interior mutability pattern. With RefCell<T>, the borrowing rules’ invariants are enforced at runtime and if you break these rules, your program will panic and exit.

Similar to Cell<T>, RefCell<T> is only for use in single-threaded scenarios and will give you a compile-time error if you try using it in a multithreaded context.

When creating immutable and mutable references, we use the &and &mutsyntax, respectively. With RefCell<T>, we access the smart pointer types Ref<T>and RefMut<T>through the safe API provided by RefCell<T>, using the borrow and borrow_mut methods. These smart pointers implement the Deref trait, allowing us to treat them as regular references. RefCell<T> keeps track of how many immutable and mutable borrows are in use, and enforces the same borrowing rules as the compiler, allowing multiple immutable borrows or one mutable borrow at any given time.

To begin, let’s examine the RefCell struct and its associated RefState enum:

#[derive(Copy, Clone)]
enum RefState {
Unshared,
Exclusive,
Shared(usize),
}

struct RefCell<T> {
value: UnsafeCell<T>,
state: MyCell<RefState>,
}

Here, RefState is an enumeration with three variants:

  1. Unshared: Indicates that no references to the contained value exist.
  2. Exclusive: Indicates that a single mutable reference exists.
  3. Shared(usize): Indicates that multiple shared references exist. The usize value represents the number of shared references.

The RefCell struct contains two fields:

  1. value: Stores the contained value using UnsafeCell, which allows for interior mutability.
  2. state: Represents the current borrow state of the RefCell using a MyCell wrapper.

Now, let’s move on to the implementation of the RefCell methods:

impl<T> RefCell<T> {
pub fn new(value: T) -> Self {
RefCell {
value: UnsafeCell::new(value),
state: MyCell::new(RefState::Unshared),
}
}

pub fn borrow(&self) -> Option<Ref<'_, T>> { /* ... */ }

pub fn borrow_mut(&self) -> Option<RefMut<'_, T>> { /* ... */ }
}

new(): Creates a new RefCell with an initial value and an Unshared state.

borrow(): Attempts to borrow a shared reference to the contained value. If successful, it returns Some(Ref); otherwise, it returns None. The method checks the current state and increments the shared reference counter as needed.

borrow_mut(): Attempts to borrow a mutable reference to the contained value. If successful, it returns Some(RefMut); otherwise, it returns None. The method checks the current state and sets it to Exclusive if applicable.

Next, we have the Ref and RefMut structs, which represent shared and mutable references to a RefCell's value, respectively:

struct Ref<'a, T> { /* ... */ }
struct RefMut<'a, T> { /* ... */ }

Ref and RefMut both store a reference to the RefCell they are associated with. Additionally, they both implement the Drop trait to update the RefCell's state when the reference is dropped.

For Ref, we have the following implementation of the Drop trait:

impl<T> Drop for Ref<'_, T> {
fn drop(&mut self) {
match self.refcell.state.get() {
RefState::Shared(1) => self.refcell.state.set(RefState::Unshared),
RefState::Shared(n) => self.refcell.state.set(RefState::Shared(n - 1)),
_ => unreachable!(),
}
}
}

drop(): Decrements the shared reference counter when a shared reference is dropped, and updates the state accordingly.

For RefMut, we have the following implementation of the Drop trait:

impl<T> Drop for RefMut<'_, T> {
fn drop(&mut self) {
self.refcell.state.set(RefState::Unshared);
}
}

drop(): Sets the state back to Unshared when a mutable reference is dropped.

Lastly, both Ref and RefMut implement the Deref trait, allowing them to be used like regular references. Additionally, RefMut implements the DerefMut trait, enabling it to be used as a mutable reference:

impl<T> Deref for Ref<'_, T> {
type Target = T;

fn deref(&self) -> &Self::Target {
unsafe { &*self.refcell.value.get() }
}
}

impl<T> Deref for RefMut<'_, T> {
type Target = T;

fn deref(&self) -> &Self::Target {
unsafe { &*self.refcell.value.get() }
}
}

impl<T> DerefMut for RefMut<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { &mut *self.refcell.value.get() }
}
}

deref(): Returns a reference to the contained value in the RefCell for both Ref and RefMut.

deref_mut(): Implemented only for RefMut, it returns a mutable reference to the contained value in the RefCell.

Here is a final implementation of RefCell: (GitHub repository)


#[derive(Copy, Clone)]
enum RefState {
Unshared,
Exclusive,
Shared(usize),
}

struct RefCell<T> {
value: UnsafeCell<T>,
state: MyCell<RefState>,
}

impl<T> RefCell<T> {
pub fn new(value: T) -> Self {
RefCell {
value: UnsafeCell::new(value),
state: MyCell::new(RefState::Unshared),
}
}

pub fn borrow(&self) -> Option<Ref<'_, T>> {
match self.state.get() {
RefState::Unshared => {
self.state.set(RefState::Shared(1));
Some(Ref::new(self))
}
RefState::Shared(n) => {
self.state.set(RefState::Shared(n + 1));
Some(Ref::new(self))
}
RefState::Exclusive => None,
}
}

pub fn borrow_mut(&self) -> Option<RefMut<'_, T>> {
match self.state.get() {
RefState::Unshared => {
self.state.set(RefState::Exclusive);
Some(RefMut::new(self))
}
_ => None,
}
}
}

struct Ref<'a, T> {
refcell: &'a RefCell<T>,
}

impl<'a, T> Ref<'a, T> {
fn new(refcell: &'a RefCell<T>) -> Self {
Ref { refcell }
}
}

impl<T> Drop for Ref<'_, T> {
fn drop(&mut self) {
match self.refcell.state.get() {
RefState::Shared(1) => self.refcell.state.set(RefState::Unshared),
RefState::Shared(n) => self.refcell.state.set(RefState::Shared(n - 1)),
_ => unreachable!(),
}
}
}

impl<T> Deref for Ref<'_, T> {
type Target = T;

fn deref(&self) -> &Self::Target {
unsafe { &*self.refcell.value.get() }
}
}

struct RefMut<'a, T> {
refcell: &'a RefCell<T>,
}

impl<'a, T> RefMut<'a, T> {
fn new(refcell: &'a RefCell<T>) -> Self {
RefMut { refcell }
}
}

impl<T> Drop for RefMut<'_, T> {
fn drop(&mut self) {
self.refcell.state.set(RefState::Unshared);
}
}

impl<T> Deref for RefMut<'_, T> {
type Target = T;

fn deref(&self) -> &Self::Target {
unsafe { &*self.refcell.value.get() }
}
}

impl<T> DerefMut for RefMut<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { &mut *self.refcell.value.get() }
}
}

Conclusion:

In this post, we covered the Cell type which provides interior mutability, allowing data to be mutated even through an immutable reference, and some fundamental concepts related to interior mutability and runtime borrow checking. We also explored the implementation of RefCell in Rust, a smart pointer that enforces dynamic borrow checking and allows for interior mutability. We broke down the RefCell struct, RefState enum, and associated methods, as well as the Ref and RefMut structs and their traits.

You can find the full code for these implementations in the following GitHub repository: https://github.com/EtaCassiopeia/rust-from-scatch

Feel free to explore the repository for further insights and examples related to Rust’s smart pointers and other features.

In the next post of this series, we will dive deeper into the implementation of other smart pointers in Rust, including Box, Rc, Arc, and Mutex. Stay tuned to learn more about these powerful abstractions and how they can help you write safer, more efficient code in Rust.

--

--