Benchmarking 1 million C# tasks vs Go goroutines: Is there any difference?
Recently, I created a toy benchmark in C# / .NET Core (on Linux) that spawns one million async tasks, to test memory overhead and scalability. Shortly thereafter, a kind stranger sent a PR with a Go version. I thought his results were very interesting and decided to run them myself.
Each of the one million tasks runs an infinite loop that does nothing but increment a shared counter variable (atomic Int64) and sleep for one second. The test ends after ~60 million hits are observed, which should thus take ~60 seconds. (It’s approximate because a master thread is polling the score count every tenth of a second, and even that time is approximate if your CPUs are saturated. This test should not be run to fully saturate your CPUs.)
The test uses /usr/bin/time to record total CPU and Wall time, and maximum memory usage. In addition, the test programs periodically print /proc/self/status so we can compute average memory usage and see what the garbage collector is doing. I ran the tests on a dual-core Pentium (no hyper threading), and did separate runs using 1, 2, and 4 worker threads.
Name UserCPU SysCPU AvgRSS MaxRSS Wall
------- --------- -------- ----------- ----------- ------
c#_1t 92.79 5.40 710,581 992,608 1:30
c#_2t 163.08 5.91 807,874 1,057,916 2:08
c#_4t 171.37 6.05 925,995 1,381,076 2:11
go_1t 53.34 0.88 2,639,375 2,639,740 1:04
go_2t 96.52 3.04 2,605,048 2,612,364 1:03
go_4t 88.39 3.93 2,607,513 2,613,348 1:02
- This is a micro-benchmark. Your application is not a micro-benchmark.
- Benchmarks can be gamed. Benchmarks are hard to do correctly.
- Be mindful of Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.”
- This code is not designed for high core count machines due to the single (not sharded) atomic counter
As you can see, Go uses less CPU but more memory. But I must admit, I’m very intrigued at how similar they are on this micro-benchmark.
Let’s keep this in perspective: the workload only increased an atomic counter to 60 million. A single thread in can do that in a loop 150X faster. Concurrency is not throughput. An application developer should be thinking creatively about how to solve problems 10x-100x faster by decomposing the problem and analyzing the problem domain data (and probably using batching).
And to be clear, there are implementation differences. C# requires you to “know the color” of all your async vs sync functions. In Go, there are only “async” functions so there is no confusion. However, C# can use this “color” knowledge for optimization: for example the Span type, which is an optimized interior pointer that adds no GC pressure, can not be persisted in an asynchronous function. As we see so often in software, a compiler that knows more context can do more optimizations.
Also, C# can directly inter-operate with C/C++ and futures/callbacks from other languages. C# has exceptions and stack traces, albeit stack traces from async functions look “interesting”.
My Personal Optimism on C#
I started programming in Logo on an Apple II and QuickBasic on an IBM XT. For the last 5 years I’ve been disillusioned by Python (for large or long-term projects needing constant refactoring) and C++ (for secure network services), but haven’t found an adequate replacement. The last time I used anything from Microsoft was in 2001.
However, I’ve very much enjoyed my recent experiences with C# on Linux. I ported a Python project to it and it felt very natural. I am optimistic that programming can feel enjoyable again —something I haven’t felt for a while. I welcome more competition for open-source, cross-platform, safe, fast, and productive programming languages.
I recently ported my wife’s handmade cards site from Python to C# (1000 LOC) and am very happy with the result. Razor templates and LINQ are nice.
- Source Code on github. I used dotnet 2.2.103 and go 1.11.5, on Ubuntu 18.04.
- I hope I didn’t mess up as badly as this hilarious benchmark failure
- I tried Tiered Compilation but experienced a slight regression. In any case, this is not a good benchmark for measuring small changes like that. This is a “ballpark” test with some margin of error.
- You also might be interested in my “500K socket connections in C#” test on github.
- Another person’s higher-level test from 2017 with similar results
- A good post: What Color is your Function