Rust Notes: PhantomData

0xor0ne
5 min readJun 15, 2023

--

This blog post will first introduce the “theoretical” concepts of the Rust PhantomData<T> type and then explore a few real-world examples showcasing its practical applications.

What is PhantomData<T>

As stated in the official documentation, PhantomData<T> is a Zero Sized Type (ZST) that consumes no space and simulates the presence of a field of the given type T. It is a marker type used to give the compiler information that is useful for the purpose of static analysis and necessary to have correct variance and drop checking.

As a quick example, it is possbile to define a structure like this:

struct PdStruct<T> {
data: i32,
pd: PhantomData<T>,
}

in this case, the field pd, whose type is PhantomData<T>, does not increase the size of structure PdStruct<T>but tells the compiler to treat PdStruct<T> as if it owns T, even though the latter is not actually used in the structure itself. So, for example, the compiler knows that when a value of type PdStruct<T> is dropped also T could be potentially dropped.

PhantomData<T> is commonly used with raw pointers, unused lifetime parameters and unused type parameters. Examples for each of the three cases are provided below.

Raw Pointers and PhantomData<T>

Let’s consider at the following code snippet:

use std::marker::PhantomData;

struct MyRawPtrStruct<T> {
ptr: *mut T,
_marker: PhantomData<T>,
}

impl<T> MyRawPtrStruct<T> {
fn new(t: T) -> MyRawPtrStruct<T> {
let t = Box::new(t);
MyRawPtrStruct {
ptr: Box::into_raw(t),
_marker: PhantomData,
}
}
}

...

In the example, MyRawPtrStruct is a simple smart pointer that owns a heap-allocated T. Rust compiler can't automatically infer the lifetime or ownership detail of the raw pointer ptr. The example usesPhantomData<T> to express the fact that MyRawPtrStruct owns a T, even though T doesn't actually appear in the struct (it's behind a raw pointer). This helps the Rust compiler correctly infer the drop order and other ownership-related properties.

Unused Lifetime Parameters and PhandomData<T>

For the unused lifetime parameters, let’s consider the following Window structure definition:

use std::marker::PhantomData;

struct Window<'a, T: 'a> {
start: *const T,
end: *const T,
phantom: PhantomData<&'a T>,
}

Fields start and end are raw pointers. They point to the start and end of a window of T values, but they don't carry any lifetime information. This means that Rust's borrow checker can't use them to enforce the lifetime 'a.

Field phantom is a PhantomData marker that carries the lifetime 'a. This tells Rust's borrow checker that the Window struct is logically tied to data of lifetime 'a, even though it doesn't actually store any references of type &'a T.

This ensures that the data pointed by the window won’t be dropped while the window is still in use. Without PhantomData, Rust wouldn't know about the lifetime relationship and couldn't, for example, protect against use-after-free bugs. In other words, PhantomData<&'a T> is used to express that Window behaves as it has a reference to a T with lifetime 'a, which helps Rust enforce the correct ownership and borrowing rules.

Also, as an additional information, note that here Window become covariant over 'a and T.

Unused Type Parameters and PhantomData<T>

In this case PhantomData<T> is used to indicate what type of data a struct is "tied" to:

struct ExternalResource<R> {
resource_handle: *mut (),
resource_type: PhantomData<R>,
}

This case arises frequently when implementing Foreign Function Interfaces (FFIs). Refer to the standard library documentation example for more information.

Real World Examples of PhantomData<T>

This section illustrates a few real world usage examples of PhantomData<T> taken directly from the Rust standard library (presented code snippets refer to Rust v1.70.0).

BorrowedFd

BorrowedFd is the borrowed version of an OwnedFd (owned file descriptor) and in the standard library it is defined in std/src/os/fd/owned.rs as:

pub struct BorrowedFd<'fd> {
fd: RawFd,
_phantom: PhantomData<&'fd OwnedFd>,
}

Here the PhantomData field (_phantom) is used to tell the Rust compiler that BorrowedFd is tied to the lifetime of the OwnedFd where BorrowedFd has been borrowed from (even though BorrowedFd doesn't actually hold a reference to OwnedFd). This is important for ensuring that OwnedFd isn't dropped while BorrowedFd is still in use.

Iter<T>

Iterator over a Slice [T] is defined in the standard library in core/src/slice/iter.rs as:

pub struct Iter<'a, T: 'a> {
ptr: NonNull<T>,
end: *const T,
_marker: PhantomData<&'a T>,
}

In this case, PhantomData<&'a T> is used to indicate that the structure Iter is tied to the lifetime 'a. This is important because it tells Rust compiler that Iter can't outlive the references it might have to T (data T is only pointed by the two raw pointers ptr and end that carry no lifetimes information). This is crucial for Rust's guarantee of memory safety.

Rc<T>

As a last example, we can see that also Rc<T>, defined by the standard library in alloc/src/rc.rs, contains a PhantomData field:

pub struct Rc<T: ?Sized> {
ptr: NonNull<RcBox<T>>,
phantom: PhantomData<RcBox<T>>,
}

PhantomData is used here to tell the drop checker that dropping Rc<T> may cause a value of type T to be dropped.

A more in depth explanation of why PhantomData is actually required in Rc can be found in this StackOverflow answer and in this section of the Rustonomicon.

Additional Information

This section provides a few links to external resources that you may find useful to deepen your knowledge about PhantomData.

In the standard libray, PhantomData is defined in core/src/marker.rs as:

pub struct PhantomData<T: ?Sized>;

Being a Zero Sized Type (ZST), PhantomData<T> occupy no space and is aligned at one byte, i.e.,:

  • size_of::<PhantomData<T>>() == 0
  • align_of::<PhantomData<T>>() == 1

PhantomData type is also strongly related to the Drop-Check (dropck) rule and the #[may_dangle] unstable attribute. To learn more about this see RFC769 where dropck has been introduced and RFC1238 and RFC1327 where dropck has been further refined.

Note also that RFC1238 introduced rules that changed the circumstances in which PhantomData<T> and #[may_dangle] are required. For example, they are used in the standard library to implement a sounding Vec<T>type which does not have to comply withthe overly restrictive drop check rule (for further information see the dedicated section in the Rustonomicon)

For an intuitive explanation and a deep dive on how PhantomData<T> works in Vec<T> definition, see this stackoverflow answer.

Finally, another interesting usage of the PhandomData type is to implement the typestate pattern, in particular the state type parameter variant.

References

0xor0ne

--

--

0xor0ne

Cyber Security | Reverse Engineering | IoT/Embedded | Exploit | Linux kernel | PhD | :)