Dictionaries and Spans

Norm Bryar
5 min readNov 7, 2022

--

Photo by Edge2Edge Media on Unsplash

I really like .Net’s Span<T>; it wonderfully reduces needless alloc-and-pour operations, making this single innovation the cornerstone of a wealth of perf-improvements in each new version of .Net (née .Net Core). Span-aware APIs are available to all of us, too (e.g. String.Create and Span<char>), so we all look smarter in the aura of the genius .Net team.

In case you haven’t heard of Span<T>, think of it like an abstraction over many sources of memory (heap, stack, marshalled) that you can slide-around atop that memory: A view basically. You can often dodge a bunch of expensive string-splitting/substring operations, by just sliding a span where you want it to point.

A big block of (semi-)structured text like JSON or CSV should get you wondering “How can I parse this with Spans?” In my case I was wondering “How can I compose a Dictionary of extracted values … without extracting any values?”

Unfortunately, a Span<T> is a ref struct and as such doesn’t like to be owned by something GC’able, so can’t be a Dictionary key nor value. However a Memory<T> has no such limitation (though still has some), and its .Span property gives you access to the highly-nifty Span-centric APIs.

Memory as Dictionary-Value

Well, can you build a Dictionary<..., ReadOnlyMemory<char>>? Yes.

string text =  
"Lorem ipsum <see cref=\"linkA\"/> dolor yada yada";
// horrible code follows, but demos getting "linkA" indices
int start = text.IndexOf("<see cref=\"", 0);
start = start >= 0 ? start + "<see cref=\"".Length : -1;
int end = start >= 0 ? text.IndexOf("\"/>", start) : -1;

Dictionary<string, ReadOnlyMemory<char>> links = new();
links.Add("see", text.AsMemory()[start..end]);

Memory<char> holds-on to its source object, so won’t risk outliving it, nonetheless you’re typically going this route because you want to hold-on to the raw text for some reason (e.g. send an RPC , enqueue/persist, …).

Of course, at some point you’re likely to want the value back out again, like via links[“see”].ToString(), so you’d want to weigh whether anything’s really being gained by ‘scheduling’ that materialization, e.g. temporally-separate phases looking at only subsets of keys (for validation or routing). Conversion-heavy cases like int.Parse(links[“timeout”].Span) could also yield a win this way.

How much a win? First round of testing, I excluded key-names being extracted rather getting re-used from a parsing helper. This test showed no memory used to make dict values (minus the const 624B used in the test-scaffold logic itself)! As expected. Execution times were ~86% (after test-overhead times subtracted from numerator and denominator).

Memory as Dictionary Key

Those who feel “if some medicine is good, way more must be way good!” their next idea might be making dictionary-keys themselves late-bound.
Can you build a Dictionary<ReadOnlyMemory<char>, ReadOnlyMemory<char>>? Yes, just a tad more work.

To be used as a key, the type has to support GetHashCode(). At first I was depressed .Net6 Memory extension methods didn’t have that overload (q.v. issue 50322), but we can easily implement an IEqualityComparer the dictionary will use instead, simply deferring to String.GetHashCode(), etc.

MemStringEqualityComparer comp = new(ignoreCase: true);
Dictionary<ReadOnlyMemory<char>, ReadOnlyMemory<char>> dict =
new(expectedKeyCount, comp);
...public class MemStringEqualityComparer :
IEqualityComparer<ReadOnlyMemory<char>>
{
private StringComparison _comparison;
public MemStringEqualityComparer( bool ignoreCase = false ) =>
_comparison = ignoreCase
? StringComparison.OrdinalIgnoreCase
: StringComparison.Ordinal;
public int GetHashCode( ReadOnlyMemory<char> obj ) =>
string.GetHashCode( obj.Span, _comparison );
public bool Equals(
ReadOnlyMemory<char> x,
ReadOnlyMemory<char> y) =>
System.MemoryExtensions.Equals(
x.Span, y.Span, _comparison );
}

Note: this will use the hash-attack-safe randomized-string-hash-algorithm, also making ReadOnlyMemory<char> hashes comparable to a mem.ToString().GetHashCode(_comparison).
But you still can’t persist the hash-value, IPC/RPC it, etc.
I haven’t looked into whether the non-randomized flavors (e.g. this) are similarly accessible to us.

Performance using Memory in Key and Value? Happily, pretty good.

IsKeyConst true => known at startup, false => discovered from raw text

As hoped, allocs with Memory<char> are gone and perf is pretty close.

If the key-strings are extracted from the corpus (IsKeyConst = False), memory is twice as bad for Dictionary<T,T> when T is string vs keys known a priori (IsKeyConst = True), as expected. But Memory<char> still 0.
(Note: the tests’ raw text used a 20-char pattern of featur001:fval001–1;featur002:fval002–1;… with keys and vals same size.)

Test code here.

A Toy Use-Case

Our hashes over from Memory<char> don’t depend on the raw text instance, making merging dictionaries even in the late-bound (IsKeyConst=false) case safe. For instance, if the use-case is multiple sources of formatted strings with an inherent order-of-precedence and we want to avoid substring ops when there’s a good chance the next source will simply replace the value, it might look like this:

Dictionary<..., ...> dict = new(estCt, comparer);
for (int s = 4; s >= 1; --s)
{
string? source = s switch
{
4 => await ReadAllAsync( localSafeDefaultsFile ),
3 => await ReadAllAsync( environmentFile ),
2 => await ReadAllAsync( tenantIdFile ),
1 => GetQueryParameter( flightQp ),
};
MergeToDictionary( dict, source );
}
...public void MergeToDictionary(
Dictionary<..., ...> dict,
string? txt )
{
if (string.IsNullOrWhitespace(txt)
return;
ReadOnlyMemory<char> txtMem = txt.AsMemory();
int start = 0;
for (;;)
{
var (key, val, next) = GetNextPair( txtMem, start );
if (next < 0)
break;
AddToDictionary( dict, txt, key, val ); start = next;
}
}
// This same syntax works
// whether ... is string or ReadOnlyMemory<char>.
internal void AddToDictionary(
Dictionary<..., ...> dict,
... mem,
Range key,
Range val ) =>
dict[ mem[key] ] = mem[val];

I’ll leave to the reader’s imagination the text-walking GetNextPair(…) implementation, but it’s returning the Ranges for where a key and its associated value are found (plus some monotonic progress offset), all achievable with span.(Last)IndexOf(…) / .Trim().

Conclusion

Span<T> has a cousin, Memory<T> that overcomes the stack-only constraints of span, but readily gives you access to its extension-methods. The heap friendly Memory<T> can even be used in collections, such as Dictionary<…, …>. This might be a viable design-option for you to avoid avoid alloc+pour and GC taxes. Keys take a smidge of work (and you may prefer a simpler Taming Duplicate Strings interning), but might pay off if the keys are not known at compile-time.

Hopefully you’ve found this an interesting explore of Memory<T>. Thanks for reading.

Bonus Material — ReadOnlyMemory is, surprise, a readonly struct. But its value-slot in the dictionary is writable (just copying over the immutable value en toto). If you wanted to inspect/act-on the value before replacing it, you could use CollectionsMarshal.GetValueRefOrAddDefault, e.g. …

ref ReadOnlyMemory<char> gammaRef = 
ref CollectionsMarshal.GetValueRefOrAddDefault(
dict,
myKey,
out bool gotVal);
if (gotVal) ...perhaps test/act-on gammaRef's old val...
gammaRef = srcMem[valRange]; // Replace it w/o 2nd lookup cost!

--

--

Norm Bryar

A long-time, back-end web service developer enamored with .Net and C#, code performance, and techniques taming drudgery or increasing insight.