Rust vs C++. A Performance Comparison. Part 2

Published in

Rustaceans

8 min readFeb 25, 2024

In the previous part, we compared how Rust and C++ are dealing with aliasing, move semantics, and dynamic dispatch. Now let’s take a look at the memory layout.

Structures memory layout and padding

It is well-known that the size of a structure in C++ may vary based on the order of its fields. For example, on a 64-bit architecture, the following two structs will likely have sizes of 24 bytes and 16 bytes, respectively:

struct Struct_1 {
    uint8_t field_1;
    uint64_t field_2;
    uint16_t field_3;
};

struct Struct_2 {
    uint64_t field_2;
    uint16_t field_3;
    uint8_t field_1;
};

static_assert(sizeof(Struct_1) == 24);
static_assert(sizeof(Struct_2) == 16);

This is a consequence of two primary factors:

Alignment rules. Each type in C++ has a statically defined alignment, ensuring that every instance of the type is placed in memory at an address that is a multiple of the type’s alignment. In most cases (though not always), the alignment for a primitive type is equal to its size. To maintain proper alignment, the compiler may introduce padding in the structure layout.
C++ standard requirement: members should appear in the memory layout in the same order as they are declared in the code.

In our case, when targeting x86–64, the structures will have the following layout:

The rationale for aligning the memory layout of primitive types is to minimize the number of instructions required for loading/storing the element from memory and to prevent splitting the element across different cache lines. If, for any specific reason, you prioritize a compact or particular layout for better access performance, you can use ‘#pragma pack(push, 1)’, and the layout would look like this:

This approach can be helpful when dealing with large arrays of data that are relatively rarely accessed or when directly memory-mapping data from I/O. However, in most cases, the default alignment is used. Therefore, it is crucial to carefully manage the order of members to keep your structures compact.

While handling such straightforward cases is not overly complex, matters become more intricate when the size of members relies on the target architecture or template arguments. Additionally, some members may be non-primitive types with a size that changes during the project’s evolution.

A crazy template or macro-based solution may be implemented to store all members consistently in descending order of alignment, but I can’t even imagine the average number of lines in a compilation error message for that. That could be a good competitor in the contest.

In Rust, the situation is different. By default, you don’t have any guarantees about the memory layout of a structure. The compiler simply asserts, ‘Hey, relax, it’s my job to place all this stuff in memory.’ Consequently, for both structures that are equivalent to those in C++, the size will be the same: 16 bytes:

struct Struct1 {
    field_1: u8,
    field_2: u64,
    field_3: u16,
}

struct Struct2 {
    field_2: u64,
    field_3: u16,
    field_1: u8,
}

const _: () = assert!(std::mem::size_of::<Struct1>() == 16);
const _: () = assert!(std::mem::size_of::<Struct2>() == 16);

And this seems to be a better approach because it’s almost always what you want! Also, Rust doesn’t require member pointers to be unique, making zero-sized structures completely legal:

struct Struct1 ();

const _: () = assert!(std::mem::size_of::<Struct1>() == 0);

In some rare cases, you may want full control over the memory layout, such as::

When using a type as an argument for FFI.
When mapping a type against raw memory.
To group members that are frequently used together to the same cache line.
Or vice versa, to force members to not be on the same cache line to avoid false sharing during intense multi-threaded access.

For those cases you have #[repr(…)] attributes:

// Force preserve fields order
#[repr(C)]
struct Struct1 {
    field_1: u8,
    field_2: u64,
    field_3: u16,
}

const _: () = assert!(std::mem::size_of::<Struct1>() == 24);

// Force remove padding
#[repr(packed)]
struct Struct2 {
    field_1: u8,
    field_2: u64,
    field_3: u16,
}

const _: () = assert!(std::mem::size_of::<Struct2>() == 11);

// Force structure alignment to be 64
#[repr(align(64))]
struct Struct3 {
    field_1: u8,
    field_2: u64,
    field_3: u16,
}

const _: () = assert!(std::mem::size_of::<Struct3>() == 64);

The last aspect to mention about a structure’s layout is bit fields. In C++, you have the ability to declare fields with specified offsets and sizes in bits, rather than bytes:

#include <cstdint>

struct Struct {
    uint8_t field_1: 3 ;
    uint8_t field_2 : 2;
    uint16_t field_3 : 11;
};

static_assert(sizeof(Struct) == 2);

However, this is just syntax sugar for bit masks. Every time you access a member, the compiler generates code that reads from memory and applies a corresponding bit mask and offset. While Rust lacks the same language features, there are crates that provide equivalent functionality through macros.

In general, I’d say Rust’s approach to memory layout is much better. For a deeper understanding, consider reading the corresponding chapter of the Unsafe Code Guidelines Reference.

Tagged unions and niche filling

Rust has built-in support for tagged unions called enumerations or enums. This is an ideal option to use when you want to represent the scenario where an object can be one of a finite set of types:

struct Data1 {
    val1: u32,
    val2: bool
}

struct Data2 {
    val1: u32,
}

enum Enumeration {
    Option1(Data1),
    Option2(Data2),
    Option3,    
}

In C++, there is no corresponding language feature, but you can achieve a similar goal with std::variant (or boost::variant for pre-C++17 cases):

#include <cstdint>
#include <variant>

struct Data1 {
    uint32_t val1;
    bool val2;
};

struct Data2 {
    uint32_t val_2;
};

struct Data3 {};

using Enumeration = std::variant<Data1, Data2, Data3>;

In C++, you won’t have a handy pattern-matching statement, but in this article, I’d like to highlight another point. The size of the Enumeration will differ: 8 bytes for Rust and 12 bytes for C++ (when targeting x86–64). The reason is very similar to the previous chapter. Rust compiler knows that val2's type (boolean) has some values that do not represent valid type values so they can be used to identify that the enum variant is different in this case. It is possible because other variant types don’t store any information at that offset. So the compiler is smart enough to use invalid type values to optimize enum size if possible and that’s pretty cool! However, it can’t use padding space though: if I change val2’s type to u8 all the magic goes away and the size of the Enumeration type becomes 12 bytes too. This is because there is no guarantee in the general case that padding bytes are zeroed. I’ve found a proposal on a Rust tracker to add a new representation type where padding space is always zeroed which will make the optimization possible in this case too.

Some Rust types have a special byte representation that the compiler recognizes as not representing a valid value:

null pointer for Box<T> or &t
zero for std::num::NonZeroXXX

This makes Option<T> (which is essentially an enumeration with two variants: Some<T> and None) have the same size as T for these types. Unfortunately, there is currently no way to inform the compiler that your type has some invalid values suitable for niche filling. Although there is an ongoing discussion about this, it appears that we may not see any such feature in the near future.

In C++, you can achieve the same effect for std::optional by creating partial specializations of it:

std::optional doesn’t support references, possibly because std::optional<T&> would be equivalent to T*. However boost::optional does support references, and since boost 1.61 there is a specialization that makes boost::optional<T&> the same size as T*.
There are no types that represent a non-null owning pointer (similar to Box<T>) as well as non-null integer types (similar to NonZeroXXX). As a result, you’ll need to write them yourself, along with partial specializations of std::optional for these types. This requires a considerable amount of effort and, well, C++ has never been too friendly.

While writing the last sentence, I realized it’s indeed possible to create a single, universally optimized partial specialization that will automatically be applied based on a certain C++20 concept. Here’s a sketch of it:

#include <cstdint>
#include <concepts>
#include <cstring>
#include <optional>

// A type that has a special invalid value that can be used to save space
template <typename T>
concept with_invalid_value = requires (T) {
    { T::is_valid(std::declval<const std::aligned_storage_t<sizeof(T), alignof(T)>&>()) } -> std::same_as<bool>;
    { T::make_invalid() } -> std::same_as<std::aligned_storage_t<sizeof(T), alignof(T)>>;
};

namespace std {

template <with_invalid_value T>
class optional<T> {
public:
    optional() : data(T::make_invalid()) {}
    optional(T v) {
        (new T)(&data)(std::move(v)); 
    }

    operator bool() const {
        return T::is_valid(data);
    }

    const T& value() const {
        if (*this) {
            return reinterpret_cast<const T&>(data);
        } else {
            throw std::bad_optional_access {};
        }
    }

    // All other stuff from `std::optional`
private:
    std::aligned_storage_t<sizeof(T), alignof(T)> data;
};

} // namespace std

// Non-zero wrapper around some integral type.
// Similar one can be created for`std::unique_ptr`,`std::shareD_ptr`, etc.
template <std::integral T>
struct non_zero_t {
    using raw_value_t = std::aligned_storage_t<sizeof(T), alignof(T)>;

    non_zero_t(T val) : val(val) {}

    static bool is_valid(const raw_value_t& raw) {
        return reinterpret_cast<const T&>(raw) == T(0);
    }

    static raw_value_t make_invalid() {
        raw_value_t result;
        T v(0);
        std::memcpy(&result, &v, sizeof(T));

        return result;
    }

    // all other conversion and arithmetic operator to added here.

    T val;
};

static_assert(sizeof(std::optional<uint64_t>) == 16);
static_assert(sizeof(std::optional<non_zero_t<uint64_t>>) == 8);

Of course, when using something like this in a real project, you need to carefully ensure that the header with the partial specialization is included everywhere. And it’s a shame that there is nothing like this in the standard library.

Let’s summarize what we have for tagged unions and optional values::

“In Rust, there is built-in support for tagged unions, enabling the compiler to optimize their layout in certain cases. However, the C++ compiler faces limitations in optimizing the layout for std::variant/boost::variant, and achieving layout optimization via partial specialization seems virtually impossible in the general case.
With niche filling, the Rust compiler automatically optimizes the size of Option<T> for certain predefined types from the standard library. In C++, you can achieve a similar effect with a partial specialization hack. However, this effort comes with the reward of being able to use this optimization for your own types. In Rust, there are no options for custom types where niches aren’t deduced automatically.

Afterword

Initially, I intended to include a chapter comparing the efficiency of abstractions in both languages. However, I now realize that such a comparison deserves a dedicated article. Stay tuned, and I hope it won’t take too long.

Hey Rustaceans!

Thanks for being an awesome part of the community! Before you head off, here are a few ways to stay connected and show your love:

Give us a clap! Your appreciation helps us keep creating valuable content.
Become a contributor! ✍️ We’d love to hear your voice. Learn how to write for us.
Stay in the loop! Subscribe to the Rust Bytes Newsletter for the latest news and insights.
Support our work! ☕ Buy us a coffee.
Connect with us: X

Rust vs C++. A Performance Comparison. Part 2

Structures memory layout and padding

Tagged unions and niche filling

Afterword

Hey Rustaceans!

Written by Dmytro Gordon