Truly Leverage C# Structs (Part 1)

A lot has been said about value-type structs in C#, because a lot should be said about them. Here’s the scoop on choosing, and best using, structs.

Norm Bryar
10 min readJan 25, 2023
The beauty of arrays of well-packed structures
Photo by Sumit Mangela on Unsplash

You’ve no doubt seen the struct keyword, used in much that same context as the class keyword. You likely also know that the tuple-syntax, e.g. public (int Code, string Msg) DoStuff();, lowers to ValueTuple<int, string> which is a struct. These are pass-by-value types, basically stored in-line within their enclosing scope. Bear with me as I give a quick overview why struct gives devs an approach-avoidance conflict.

Structs: The Good, Bad, and Ugly

Pros

  • Removes indirection.
    A collection of reference-objects (i.e. class instances) always requires traversing an address to get to the instance’s fields: 1 hop into the collection, 1 more hop into the element. And if a class’ field is also a class, a third hop to get to those elements.
    But structs inline all this, eliminating extra hops.
  • More compact.
    Objects have object-header or virtual method-table overhead in their representation. On x64, 16 bytes of every object serve overhead. In collections of non-polymorphic class-objects, and if per-object locks aren’t used, the overhead is waste for you.
    Structs eliminate this overhead.
  • More cache-line friendly.
    No indirection means array elements sit contiguous in memory. No object-overhead makes that memory compact. Structs thus lend themselves well to the memory speed-ups in-built into the CPU.
  • Lessens garbage-collection overhead.
    With objects, even a local variable is created on the heap then must be garbage-collected. Struct local variables are created on the stack. Struct fields are inlined in the enclosing entity (even if it is an object on the heap, thus not separately tracked by GC).
    Stackalloc and Span<T> of structs mean containing arrays aren’t GC’d either.
  • Help avoid mutation.
    Object references can allow callees to (unexpectedly) make mutating changes, a source of bugs. Structs use copy-by-val, or thein param, to keep the original pristine.

Cons

  • Less forgiving.
    Nuances here mean innocent mistakes can torpedo your gains.
    * Boxing / unboxing brings the heap back into the picture, and forces copying the struct’s data-fields at each such heap/stack transition.
    * Defensive copies arise when the compiler doesn’t know a struct-method doesn’t mutate state.
    * Evolution, either in adding fields or use in a new context, can re-expose you to said torpedoes.
  • Costly default comparison.
    * Forgetting to override Equals and GetHashCode, may saddle you with very slow reflection.
    * A struct’s default GetHashCode, in particular, is criminally negligent!
  • [StructLayout(LayoutKind)] Effects.
    * Aggressive packing, e.g. Pack=1, could split reads across non-native-type-size boundaries, e.g. to read an un-aligned long (8-bytes) would require 2 reads, first (_,_,_,_,_,_,X,X) then (X,X,X,X,X,X,_,_), shifting each, and a bit-OR op to re-assemble. All waste.
    * Naïve field-ordering, e.g. a bool then a double, could add 7 bytes of padding after the bool to keep the larger double native-size-aligned (true even if you didn’t do naughty things with Pack=).
  • Structs lack inheritance
    Polymorphism of different ‘base’ elements breaks array packing.
  • Misc other dev-unfamiliar idioms.
    * Few folks have extensive experience with keywords ref, ref return, in, and readonly applied to structs, increasing the chance for bugs.
    * Ref/in/out params are banned from async-methods due to scope-lifetime concerns. Your struct work may collide with async work.
    * Generic constraints (Foo<T> where T:struct or where T:class) and nullable-reference syntax (DoStuff( T? blah){…}) don’t always easily mix. The compiler won’t use generic-constraint declarations to aid overload resolution. Your devs may have to learn tricks like nullability-attributes, e.g. Go([MaybeNull] T arg) to now accommodate structs.
    * And don’t get me started about ref struct.

General Advice

So, how do you avoid the cons?

Avoiding Boxing

The ref keyword is a major way to avoid boxing, but it only works when the lifetime scope plainly keeps the memory-location live to allow a reference to it (if the stack has unwound, the memory might be gone or re-purposed). Of course, avoid declaring methods that take only an object parameter, which will surely box structs. I know this is at odds for a popular extension technique, Dictionary<string, object>

A common case is your struct implements some interface(s). Beware just calling library methods with that interface as a param will force the struct to be boxed. Hopefully, you’re the author of the (extension-)methods and can make a generic, struct-centric overload:

public void AvoidsBoxing<T>( T valStruct )
where T : IInterface // compiler knows Method() will exist...
{
valStruct.Method(…); // ...still a callvirt, but non-boxed
}

// struct is boxed to get an IInterface reference,
// then has to do a v-table lookup to get to Method().
public void OopseBoxing( IInterface boxedStruct ) =>
boxedStruct.Method(…);
LINQPad MSIL showing box op (at IL_0017)

For a surprising example of boxing impact, look at Matt Warren’s Adventures in Benchmarking blog under the Dictionary vs IDictionary section.

Avoiding (Defensive) Copy Expense

The old advice that a struct should be small (~16bytes) is a tad onerous; when you can just use ref or in to avoid copy-constructing all those fields. As mentioned above, however, the compiler has to be certain the struct’s lifetime safely allows this.

Sadly, ref strips any immutability guarantee you might have wanted. This is where the the in keyword comes in, acting kind’a like const Foo& in C++. With in, the compiler keeps the original pristine, … but it does so by inserting defensive-copy ops on any method (and remember, a property-getter is just a method) unless adorned with readonly. No kidding, the compiler’s stupid this way, even things short enough to inline and clearly are immutable need the readonly keyword.

One can mark the whole struct readonly (and consider a record-style with-expression to support to copy-on-write, but note it does a full-copy before the with { … } inits). But readonly methods such as property-getters can be more surgical. Warn your team, lest someone forget their new method needs this keyword.

I wish the story with Analyzer support were better to warn you if a defensive-copy occurs. The ErrorProne.Net.Structs analyzer would have been my first choice but as of this writing (Jan. 2023, VS 17.4.0), it fails builds with AD0001 “…InvalidCast…” errors. On earlier VS, there were also some false-positives, e.g. on ReadOnlySpan<T> where defensive copy size is actually negligible. You might want to ‘fork’ the logic into your team’s analyzer.

LINQPad showing defensive copy

Avoiding Default Equality Woes

I’m tempted to just always hand-code an IEquatable implementation.

You MUST beware struct's default GetHashCode has unforgivable behavior. It only looks at the first field in the struct! Seriously. If that first field holds low-cardiality, poorly discriminative values (.State=”California”), then your O(1) Dictionary<,> becomes an O(1_000_000) list. Maybe you’re not using this struct as a key in HashSet/Dictionary, but will the next dev try to?

Though adding a maintenance burden, Equals might be good to hand-implement. I’m not sure how often record struct might emit an Equals with a fast memcmp of blittable fields (nor if ValueTuple ever does so), but I think its typical choice of EqualityComparer<T>Default.Equals() is slower than I’d have done by hand (esp. given dev knowledge on ordering compares by cardinality / early-out likelihood). Oh, and mark them readonly, just in case.

Struct Layout Hazards

Perhaps for the advanced, … but who doesn’t want to be more advanced?

For class-object types, the compiler defaults to LayoutKind.Auto, re-organizing the fields you’ve declared in size-descending order (sub-sorted by your declared ordering). This keeps every field safely aligned on a multiple of its own length, avoiding split-reads.

[StructLayout( LayoutKind.Auto )]
internal record struct SomeAuto( byte B, long L, int I, long Z, long G );

Using the Object Layout Inspector (NuGet), we see Auto save us.

Type layout for 'SomeAuto'
Size: 32 bytes. Paddings: 3 bytes (%9 of empty space)
|===========================================|
| 0-7: Int64 <L>k__BackingField (8 bytes) | 👈 new order L, Z, G, I, B
|-------------------------------------------|
| 8-15: Int64 <Z>k__BackingField (8 bytes) |
|-------------------------------------------|
| 16-23: Int64 <G>k__BackingField (8 bytes) |
|-------------------------------------------|
| 24-27: Int32 <I>k__BackingField (4 bytes) |
|-------------------------------------------|
| 28: Byte <B>k__BackingField (1 byte) |
|-------------------------------------------|
| 29-31: padding (3 bytes) |
|===========================================|

But a struct may or may not be automatically re-arranged. By default, according to docs, structs that don’t hold any reference-types within them use LayoutKind.Sequential . Thus what order you told it to use, no matter your level of sobriety or sophistication, incurs whatever extra padding is needed to keep fields at their native-size alignment.

internal record struct NaiveStruct(byte B, int I, byte B2, ushort S );
Type layout for 'NaiveStruct'
Size: 12 bytes. Paddings: 4 bytes (%33 of empty space) 👈 OUCH!
|============================================|
| 0: Byte <B>k__BackingField (1 byte) |
|--------------------------------------------|
| 1-3: padding (3 bytes) |
|--------------------------------------------|
| 4-7: Int32 <I>k__BackingField (4 bytes) |
|--------------------------------------------|
| 8: Byte <B2>k__BackingField (1 byte) |
|--------------------------------------------|
| 9: padding (1 byte) |
|--------------------------------------------|
| 10-11: UInt16 <S>k__BackingField (2 bytes) |
|============================================|

LayoutKind.Auto would be 8 bytes, 0% empty-space!

Personally, I suspect it’s a good practice to always include the [StructLayout(LayoutKind.Xxx)] attribute on structs to announce your intent and prompt your code-reviewers to look-out for bad padding (which you’ll help them to recognize by adding some of my Resource Links below to your team wiki, yeah?).

Omit setting the Pack field unless you’re having to do cross-platform interop (which might be best on a dedicated DTO type, not the runtime struct you’re hoping to optimize perf with).

A weird behavior I observe when using ObjectLayoutInspector is that if the struct holds any reference-type fields (e.g. a string or an array), any explicit LayoutKind you applied is silently overridden to beAuto. I’ve yet to see this described in any of the official docs. As to why, perhaps Microsoft feels interop/binary-persistence might be out-of-the-picture once managed-memory references enter the scene, so … Analyzer rule sorely needed here!

Do the Pros Outweigh the Cons?

Ok, so you’ve seen some traps and the mindfulness needed to avoid them. Possibly this explains why you’re not seeing structs everywhere. But is that warranted?

Many successful apps can live and grow blissfully free of structs. But some domains need to consider more bare-metal behaviors and constraints, even if just for perf-critical hot-paths. Clever use of structs may save the day.

If you’re going the struct route, I think you’d want to also consider related changes to really leverage these finicky, princess-and-the-pea data-types, tricks that may help tip the pros/cons scale decisively. Such tricks will appear in the Part 2 article to follow. Teaser: cache-lines make a cameo.

Esoterica with Layout

We’ll close with a few simple, but less advertised struct tricks.

Union Fields

Sometimes it have both component- and composite-fields in a union-struct, e.g. have an int consilidate R, G, B byte color channels.

[StructLayout(LayoutKind.Explicit)]
public struct RGBA
{
[FieldOffset(0)] public byte Blue;
[FieldOffset(1)] public byte Green;
[FieldOffset(2)] public byte Red;
[FieldOffset(3)] public byte Alpha;
[FieldOffset(0)] public uint Quad; // DWORD of Red, Green, Blue, Alpha

public RGBA( uint q )
{
this.Quad = q;
// Unsafe.SkipInit( out this.Red ); // optionally list comp fields
Unsafe.SkipInit( out this.Green );
Unsafe.SkipInit( out this.Blue );
Unsafe.SkipInit( out this.Alpha );
}

public RGBA( byte r, byte g, byte b, byte a = (byte) 0xFF )
{
this.Red = r;
this.Green = g;
this.Blue = b;
this.Alpha = a;
Unsafe.SkipInit( out this.Quad );
}

One could union fields of other numeric type, e.g. [Flags] enum Fruit : byte { Apple=1, Banana=2, Date=4, Kiwi=8 }. The point is the Equals and GHC(), persistence, etc. could operate on just the uint Quad.

Ref Variables and Collections

Familiarize yourself with the ref keyword for parameters, locals, and even returns (of ref or array params). This allows you to point at (and mutate) a struct without the cost of copy-construction. For arrays, this is straight-forward.

For Dictionary collections, look at CollectionsMarshal utility methods (and perhaps the Dictionaries and Spans article ;- )

Inline Arrays

In ‘safe’ code, an array (even of primitive-elements) would be a reference. However, by enabling ‘unsafe’ code you should be able to inline a fixed-size buffer primitive array, e.g. public fixed byte sha1Hash[20];

[StructLayout( LayoutKind.Sequential )]
internal unsafe struct InlineArr
{
public fixed byte sha1Hash[20];
public ushort major;
public ushort minor;
public ushort rev;
}
...
TypeLayout.PrintLayout<InlineArr>();

yields

Type layout for 'InlineArr'
Size: 26 bytes. Paddings: 0 bytes (%0 of empty space)
|=====================================================|
| 0-19: <sha1Hash>e__FixedBuffer sha1Hash (20 bytes) |
| |========================================| |
| | 0: Byte FixedElementField (1 byte) | |
| |----------------------------------------| |
| | 1-19: padding (19 bytes) | |
| |========================================| |
|-----------------------------------------------------|
| 20-21: UInt16 major (2 bytes) |
|-----------------------------------------------------|
| 22-23: UInt16 minor (2 bytes) |
|-----------------------------------------------------|
| 24-25: UInt16 rev (2 bytes) |
|=====================================================|

This might dovetail with, say, C#11’s utf-8 string-literals, etc. But should one add fixed-array fields to their struct? You can weigh its effect on CPU cache-line population after reading the Part 2 article to come.

Conclusion

I told you C# structs was a big and nuanced topic, didn’t I?

You can pretty quickly change a class to a (record-)struct and benchmark if you get a perf or mem win. But, as nearly always the case, the more you know, the better (and less risky) it gets. Try to pick better data-types, intelligently inline hot data and split into a reference object cold data, and be mindful of access-patterns to truly leverage what that foray into structs has started for you.

These concepts are evergreen and portable. Insights gained here will serve you your whole career long, so I highly advise taking a tour through the links below.

We continue our journey in Part 2.

Truly Leverage C# Structs (Part 2) | by Norm Bryar | Jan, 2023 | Medium

~Norm Bryar

Resource Links for the Deeply Curious

--

--

Norm Bryar

A long-time, back-end web service developer enamored with .Net and C#, code performance, and techniques taming drudgery or increasing insight.