Speed up your calculations with SIMD and vectors

Greg Chatziantoniou
tech.thesignalgroup

--

I’ve been working on C# and various .NET flavours for over 20 years now. The evolution of the language and the framework has been exciting to observe over these years. Ever since the launch of the multi-platform, open source .NET core versions, this evolution has sped up dramatically.

Looking for ways to make my code more performant, I came across a -not so new- feature of .NET, vectorisation. Vectors, along with other types, are SIMD-accelerated numeric types. SIMD stands for Single Instruction, Multiple Data and, as Microsoft explains it, it provides hardware support for performing an operation on multiple pieces of data, in parallel, using a single instruction.

The types that are SIMD-accelerated exist under the System.Numerics namespace and, to quote some of them, they are: Vector , Vector<T> , Vector2 , Vector3 , Vector4 , Matrix3x2 , Matrix4x4 , Plane andQuaternion .

I will not go into details on these types, you can find documentation here https://learn.microsoft.com/en-us/dotnet/standard/simd. From this page you can navigate and explore these types. I am writing this article in order to present my very simple example of Vector usage, together with benchmarks, hoping to make these APIs more familiar to you.

So, my simple example has to do with creating a tuple with the square roots of a point’s coordinates. Let’s say we have the following struct

 readonly record struct Point(float X, float Y)
{
public (double x, double y) Sqrt() => new (Math.Sqrt(X), Math.Sqrt(Y));
}

In the above example the method Sqrt() returns a tuple populated with the square roots of the point’s coordinates. new Point(9,16).Sqrt()returns (x:3,y:4). Not life saving or mind blowing but it will do for the example.

Vectorisation will allow us to do the two operations (square root of X and square root of Y) in one operation and hardware accelerated if the host machine allows it (you can check this with Vector.IsHardwareAccelerated ).

The code to do this using SIMD-accelerated types looks like:

 var v = Vector2.SquareRoot(vector);
return (v.X, v.Y);

where vector = new Vector2(float X, float Y) — pseudocode

Now, this seems all very trivial but the benchmarks on my MacBook Pro consistently returned results like this:

Benchmark results

There is a 35–40% improvement for this very simple mathematical operation.

I wanted to showcase the existence of SIMD-accelerated types in .NET, what they are and show a very simple example of how they can be used. When you have more complex mathematical operations such as machine learning algorithms, other types such as the Matrix and the Plane ones offer far greater performance improvements than the one I demonstrated here.

I will follow up with a more in depth post, exploring the more advanced types and usages. Until then, you might want to have a look at Steven Toub’s piece on vectorisation improvements in .NET 8.

Also, keep in mind this little footnote by Microsoft:

SIMD is more likely to remove one bottleneck and expose the next, 
for example memory throughput. In general the performance benefit
of using SIMD varies depending on the specific scenario,
and in some cases it can even perform worse than
simpler non-SIMD equivalent code.

--

--