Parallel Foreach async in C#

Alexandru Puiu
3 min readJan 6, 2020

--

Foreach itself is very useful and efficient for most operations. Sometimes special situations arise where high latency in getting data to iterate over, or processing data inside the foreach depends on an operation with very high latency or long processing. This is the case for example with getting paged data from a database to iterate over. The goal is to start getting data from the database, but a chunk of data at a time, since getting one record at a time introduces its own overhead. As the data becomes available, we’d start processing it, while in the background we get more data and feed it into the processor. The processing part would itself be parallel as well, and start processing the next iterator.

ForEachAsync

My favorite way to do this is with an extension method Stephen Toub wrote many years ago, that accepts a data generator and breaks the data source into partitions allowing for specifying the degree of parallelism and accepts a lambda to execute for each item

The history of it and previous versions are available here: https://devblogs.microsoft.com/pfxteam/implementing-a-simple-foreachasync/

The cool part is when we combine the generator pattern as the IEnumerable source.

When we combine the two, we’re fetching 20 pages from the database in parallel, then iterating over the results from each. Pausing execution of the thread on each result and yielding that item to the async lambda we have going on in the other thread:

Modernizing Async Foreach

Thanks to houseofcat we can actually improve on the above a bit by using some newer language features

source: https://houseofcat.io/tutorials/csharp/async/parallelforeachasync

C# 8.0 and Async Streams

From using this extensively whenever I need to work on really large data sets, I can vouch for its performance, but I’ve been looking for ways to push it even further. Once C# 8.0 announced async foreach, my interest was peaked. And it turns out that we can do better. Yielding on each item causes a lot of context switches, so want to yield one page at a time, but then we have to deal with nested foreach statements, and it’s just not as cool as the one-liner above.

The first thing we have new in C# 8 is IAsyncEnumerable, so our query can now look like this:

And using the new await foreach we would expect the following to get close in at least optimizing the query part

The above actually is pretty bad in execution time, but it’ll come in handy soon.

Optimizing Parallel Foreach Further

Our next iteration comes from Stackoverflow. We instead use the TaskScheduler class with ActionBlock, and so far this is quite a bit faster than all the previous solutions

Source: https://stackoverflow.com/questions/14673728/run-async-method-8-times-in-parallel

Using it looks like this now

Optimizing Parallel async Foreach with C# 8.0 async streams

Fortunately, we can take advantage of the C# 8.0 async streams feature, and optimize this even more:

Simulating a really slow connection to the database, and slow processing using Thread.Sleep(100) for both querying after each page, and during each foreach iteration, we get the following performance numbers:

Then I decided to turn it into a real benchmark project and test on some different size datasets.

Using 913 Order records

Using 9130 Order records

DotNetBenchmark project, results and source code are available on my github

--

--

Alexandru Puiu

Systems Engineering advocate, Software Engineer, Security Architect / Researcher, and Certified Scrum Master with a passion for Distributed Systems