Efficiently clean a string with .NET

BEN ABT
medialesson
Published in
5 min readMay 28, 2024

--

Strings are one of the most commonly used types in .NET applications — and very often the source of inefficient code. For example, cleaning up a string — such as removing invalid or non-visible characters — is one of the most common use cases for user input. Unfortunately, the most convenient, but not the most efficient, implementation imaginable is used in this case: Linq.

🌳 See full sample here: Sustainable Code by BEN ABT on GitHub

String Manipulation

The most inefficient option, as already mentioned, is direct string manipulation with Linq. Unfortunately, I see this variant by far the most.

string.Concat(source.Where(c => char.IsControl(c) is false))
| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| Linq | .NET 8.0 | 1,646.5 ns | 3.47 | 0.03 | 0.0782 | - | 1.3 KB | 1.07 |

String Builder

The StringBuilder is significantly faster, as it works directly on the memory area of the string and is therefore very efficient.

StringBuilder sb = new();

foreach (char c in source)
{
if (char.IsControl(c) is false)
{
sb.Append(c);
}
}

return sb.ToString();
| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| StringBuilder_Instance | .NET 8.0 | 860.0 ns | 1.81 | 0.02 | 0.2270 | 0.0019 | 3.71 KB | 3.04 |

The disadvantage at this point is that the StringBuilder must be initialized in addition to the actual operation, which is why we also have an additional Gen1 allocation. However, this disadvantage can be optimized by pooling, so that this takes about 15% less time than the instance variant and about 50% less than the Linq variant.

| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| StringBuilder_Pool | .NET 8.0 | 706.2 ns | 1.49 | 0.02 | 0.0744 | - | 1.22 KB | 1.00 |

Span

The span implementation has been available in .NET for almost seven years, but it is still often not used. Admittedly, the barrier to entry of understanding Span and the somewhat “low level” approach doesn’t make it easy, but the results speak for themselves.

The four different span implementations are all faster and more efficient by a large margin; led by an unsafe implementation, which however cannot be used in all scenarios; for example when unsafe code blocks are not possible or allowed.

| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| Span | .NET 8.0 | 394.4 ns | 0.83 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| Span1 | .NET 8.0 | 474.7 ns | 1.00 | 0.00 | 0.0744 | - | 1.22 KB | 1.00 |
| Span2 | .NET 8.0 | 411.8 ns | 0.87 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| Span2Unsafe | .NET 8.0 | 375.1 ns | 0.79 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
public static string UsingSpan(string source)
{
int length = source.Length;
char[]? rentedFromPool = null;

// allocate
Span<char> buffer = length > 512 ?
(rentedFromPool = ArrayPool<char>.Shared.Rent(length))
: (stackalloc char[512]);

// filter
int index = 0;
foreach (char c in source)
{
if (char.IsControl(c) is false)
{
buffer[index] = c;
index++;
}
}

// only return the data that was written
string data = buffer.Slice(0, index).ToString();

// cleanup
if (rentedFromPool is not null)
{
ArrayPool<char>.Shared.Return(rentedFromPool, clearArray: true);
}

return data;
}

🌳 Sustainable Code

Sustainable code is becoming increasingly important — this should not be neglected, especially in migration projects. This is the best opportunity to implement sustainable code from the ground up.

You can find the complete example of how to clean up strings efficiently here, and more examples of sustainable implementation of everyday code with C# and .NET under https://github.com/BenjaminAbt/SustainableCode.

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4412/22H2/2022Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-preview.3.24204.13
[Host] : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
.NET 8.0 : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
.NET 9.0 : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2


| Method | Runtime | Mean | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| StringBuilder Pool | .NET 8.0 | 706.2 ns | 1.49 | 0.02 | 0.0744 | - | 1.22 KB | 1.00 |
| StringBuilder Instance | .NET 8.0 | 860.0 ns | 1.81 | 0.02 | 0.2270 | 0.0019 | 3.71 KB | 3.04 |
| Linq | .NET 8.0 | 1,646.5 ns | 3.47 | 0.03 | 0.0782 | - | 1.3 KB | 1.07 |
| Span | .NET 8.0 | 394.4 ns | 0.83 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| Span1 | .NET 8.0 | 474.7 ns | 1.00 | 0.00 | 0.0744 | - | 1.22 KB | 1.00 |
| Span2 | .NET 8.0 | 411.8 ns | 0.87 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| Span2Unsafe | .NET 8.0 | 375.1 ns | 0.79 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| | | | | | | | | |
| StringBuilder Pool | .NET 9.0 | 616.3 ns | 1.19 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| StringBuilder Instance | .NET 9.0 | 715.9 ns | 1.39 | 0.01 | 0.2270 | 0.0019 | 3.71 KB | 3.04 |
| Linq | .NET 9.0 | 1,663.5 ns | 3.22 | 0.02 | 0.0782 | - | 1.3 KB | 1.07 |
| Span | .NET 9.0 | 404.9 ns | 0.78 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| Span1 | .NET 9.0 | 516.1 ns | 1.00 | 0.00 | 0.0744 | - | 1.22 KB | 1.00 |
| Span2 | .NET 9.0 | 398.9 ns | 0.77 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |
| Span2Unsafe | .NET 9.0 | 389.4 ns | 0.75 | 0.01 | 0.0744 | - | 1.22 KB | 1.00 |

Autor

Benjamin Abt

Ben is a passionate developer and software architect and especially focused on .NET, cloud and IoT. In his professional he works on high-scalable platforms for IoT and Industry 4.0 focused on the next generation of connected industry based on Azure and .NET. He runs the largest german-speaking C# forum myCSharp.de, is the founder of the Azure UserGroup Stuttgart, a co-organizer of the AzureSaturday, runs his blog, participates in open source projects, speaks at various conferences and user groups and also has a bit free time. He is a Microsoft MVP since 2015 for .NET and Azure.

--

--