The same day .NET 5 was released I shared a single screenshot showing how much faster .NET 5 is relatively to .NET Core 3.1. I promised to share more data later — and here it is.
1. Fusion’s Caching Test — running on Ubuntu + Intel and AMD CPUs
Code: https://github.com/servicetitan/Stl.Fusion.Samples , see “Caching Sample” section there.
Overall, these tests stress an API endpoint fetching a small object from SQL database, the only difference is how it is exposed (locally or via HTTP) and if Fusion is used there to provide transparent caching.
My initial test was performed on Window & AMD CPU, but almost every production service runs on Linux nowadays, so I decided to fix this and add Intel-based system to the test.
You might notice web service speedup is actually lower than in my initial post — most likely, that’s because .NET Core 3.1 version of code used this time was built with the newest (5.0.0) versions NuGet packages. In other words, it compares just the runtimes, but factors out the libraries.
2. YetAnotherStupidBenchmark — SIMD intrinsics, tight loops, IO
Originally written as an attempt to prove or disprove a thesis that C# could be nearly as efficient as C++ on “data crunching” problems, this simple benchmark compares C# and C++ efficiency on ~ “scan, decode, and aggregate” task.
The results were produced as follows:
- Run each test 6 times — for each framework and CPU
- Take the best result among all 6 runs
- Compute the speedup as a geometric mean of (NET31_Time / NET5_Time) across all tested systems.
Surprisingly, .NET 5 manages to speed up even a heavily optimized code by up to 7% in comparison to .NET Core 3.1.
You could also notice that:
- ~1.5-years old results on the same Core-i7 8700K were 95ms for C++ and 101ms, i.e. they are now ~ 5% worse for both CPUs. An impact of Spectre and Meltdown mitigation?
- .NET 5 gets extremely close to C++ on Core i7: it produces the same 99ms on SIMD version of the test, but relying on async pipeline reader rather than memory mapped file.
- But there is a significant leap between C++ (74ms) and .NET (93ms) on the same test on Ryzen Threadripper. And since it’s almost identical SIMD code, I’m unsure what to blame here. If you know — please share this in comments.
Originally written to compare .NET and Go garbage collection and peak allocation performance. I’ll focus only on allocations here, because all other metrics are pretty similar (though you’re welcome to compare the raw output).
Notes on this test:
- Interestingly, .NET 5 is faster on burst allocations and smaller heaps, but slower on larger heaps — though not dramatically.
- I had to exclude 75% RAM static set result from the chart — the 110M ops/s for .NET Core 3.1 there is way off from everything else, and I didn’t have enough time to find out why, though my guess is: somehow one extra full GCs were triggered during this test (it takes ~ 5s on ~ 50GB heap).
It worth saying that GCBurn is definitely the least useful one among these tests: it runs the code you’ll hardly ever see in production. Its goal is to turn memory allocator and GC into a bottleneck to measure their efficiency, but in reality this never happens, and moreover, even if you hit e.g. memory allocator performance limit, you have a number of workarounds to address this.
Besides that, GCBurn doesn’t use pinned objects, so it completely disregards one of key improvements in .NET 5 GC.
So is .NET 5 GC better or worse? One more chart illustrates how big could be the difference between the stress test and the reality:
This is how RAM usage changed for the only service (a tiny one) we migrated to .NET 5 so far. As you see, it’s dropped by more than 2x. This is certainly quite motivating, so I hope to write another post with more of production data soon.
You can find the raw data produced by these tests in this Google Sheet; the raw output of GCBurn and YetAnotherStupidBenchmark could be found in their repositories — see “/results” folder there.
P.S. If you love reading about .NET and performance, check out Fusion — my bold attempt to revolutionize the way we implement caching and real-time UI updates.