Understanding Async, Avoiding Deadlocks in C#

Published in

Rubrikk Group

15 min readMay 9, 2018

Typical code that might pop up in a C# codebase and can be pretty dangerous.

You ran into some deadlocks, you are trying to write async code the proper way or maybe you’re just curious. Somehow you ended up here, and you want to fix a deadlock or improve your code.

I’ll try to keep this concise and practical, and for further reading check out the related articles. To write proper async C# code and avoid deadlocks you need to understand a few concepts.

Setting up good practices can help avoiding common issues, but sometimes that’s not enough, and that’s when you need to understand what’s happening below the abstraction layers.

You should already be familiar with async code, there are many articles that discuss on how to use it, but not many explain how they work. If you’re not familiar at all I recommend at least reading something about it. Ideally you should already have some experience using async functions.

Tasks or Threads?

Tasks have nothing to do with Threads and this is the cause of many misconceptions, especially if you have in your might something like “well a Task is like a lightweight Thread”. Task is not thread. Task does not guarantee parallel execution. Task does not belong to a Thread or anything like that. They are two separate concepts and should be treated as such.

Task represents some work that needs to be done. A Task may or may not be completed. The moment when it completes can be right now or in the future.

The equivalent in some many languages is the Promise. A Task can be completed just like how a Promise can be fulfilled. A Task can be faulted just like how a Promise can be rejected. This is the only thing that a Task does, it keeps track whether a some work has been completed or not.

If the Task is completed and not faulted then the continuation task will be scheduled. Faulted state means that there was an exception. Tasks have an associated TaskScheduler which is used to schedule a continuation Task, or any other child Tasks that are required by the current Task.

Threads are a completely different story. Threads just as in any OS represent execution of code. Threads keep track what you execute and where you execute. Threads have a call stack, store local variables, and the address of the currently executing instruction. In C# each thread also has an associated SynchronizationContext which is used to communicate between different types of threads.

C# uses Threads to run some code and mark some Tasks as being completed. For performance reasons there is usually more than one thread. So Threads execute Tasks… simple you might think… but that’s not the whole picture. The whole picture looks look something like this:

Threads execute Tasks which as scheduled by a TaskScheduler.

What Does await Really Do?

Let’s start with an example. This is how you would properly implement an I/O bound operation. The application needs to request some data from a server. This does not use much CPU, so to use resources efficiently we use the async methods of HttpClient.

The proper async / await version:

public async Task<String> DownloadStringV1(String url)
{
    // good code
    var request = await HttpClient.GetAsync(url);
    var download = await request.Content.ReadAsStringAsync();
    return download;
}

The code example should be obvious what it does if you are at least a bit familiar with async/await. The request is done asynchronously and the thread is free to work on other tasks while the server responds. This is the ideal case.

But how does async await manage to do it? It’s nothing special, just a little bit of syntactic sugar over the following code. The same async await can be achieved by using ContinueWith and Unwrap.

The following code example does the same thing, with small differences.

ContinueWith / Unwrap version (this is still async):

public Task<String> DownloadStringV2(String url) 
{ 
    // okay code 
    var request = HttpClient.GetAsync(url); 
    var download = request.ContinueWith(http => 
        http.Result.Content.ReadAsStringAsync()); 
    return download.Unwrap(); 
}

Really, that’s all what async/await does! It will schedules tasks for execution and once a task is done another task is scheduled. It creates something like a chain of tasks.

Everything you do with async and await end up in an execution queue. Each Task is queued up using a TaskScheduler which can do anything it wants with your Task. This is where things get interesting, the TaskScheduler depends on context you are currently in.

Code that might work in some contexts…

Let’s look at the same DownloadString function, but this time it’s implemented in a bad way. This might still work in some cases.

This type of code should be avoided, they should never be used in libraries that can be called from different contexts.

The following example is a sync version which achieves the same thing, but in a very, very different way. It blocks the thread. We’re getting to unsafe territory. It’s radically different from the code above and should never be considered an equivalent implementation.

Sync version, blocks the thread, not safe:

public String DownloadStringV3(String url) 
{ 
    // NOT SAFE, instant deadlock when called from UI thread
    // deadlock when called from threadpool, works fine on console
    var request = HttpClient.GetAsync(url).Result; 
    var download = request.Content.ReadAsStringAsync().Result; 
    return download; 
}

The code above will also download the string, but it will block the calling Thread while doing so, and it that thread is a threadpool thread, then it will lead to a deadlock if the workload is high enough. Let’s see what it does in more detail:

Calling HttpClient.GetAsync(url) will create the request, it might run some part of it synchronously, but at some point it reaches the part where it needs to offload the work to the networking API from the OS.
This is where it will create a Task and return it in an incomplete state, so that you can schedule a continuation.
But instead you have the Result property, which will blocks the thread until the task completes. This just defeated the whole purpose of async, the thread can no longer work on other tasks, it’s blocked until the request finishes.

The problem is that if you blocked the threads which are supposed to work on the Tasks, then there won’t be a thread to complete a Task.

This depends on context, so it’s important to avoid writing this type of code in a library where you have no control over the execution context.

If you are calling from UI thread, you will deadlock instantly, as the task is queued for the UI thread which gets blocked when it reaches the Result property.
If called from threadpool thread then a theadpool thread is blocked, which will lead to a deadlock if the work load is high enough. If all threads are blocked in the threadpool then there will be nobody to complete the Task.
But this case will work if you’re calling from a main or dedicated thread. (which does not belong to threadpool and does not have syncronization context)

Let’s look an example which is just as bad, but can work fine in other cases.

Sync version, defeats the purpose, blocks the calling thread and definitely not safe:

public String DownloadStringV4(String url) 
{ 
    // NOT SAFE, deadlock when called from threadpool
    // works fine on UI thread or console main 
    return Task.Run(async () => { 
        var request = await HttpClient.GetAsync(url); 
        var download = await request.Content.ReadAsStringAsync(); 
        return download; 
    }).Result; 
}

The code above also blocks the caller, but it dispatches the work to the threadpool. Task.Run forces the execution to happen on the threadpool. So if called from a different thread than a threadpool thread, this is actually pretty okay way to queue work for the threadpool.

If you have a classic ASP.NET application or a UI application, you can call async functions from sync function using this method, then update the UI based on the result, with the caveat that this blocks the UI or IIS managed thread until the work is done. In case of the IIS thread this is not a huge problem as the request cannot complete until the work is not done, but in case of a UI thread this would make the UI unresponsive.
If this code is called from a threadpool thread, then again it will lead to a deadlock if the work load is high enough because it’s blocking a threadpool thread which might be necessary for completing the task. Best is to avoid writing code like this, especially in context of library where you have no control over the context your code gets called from.

And now let’s look a the final version, which does horrible things…

Deadlock version. Dont write this:

public String DownloadStringV5(String url) 
{ 
    // REALLY REALLY BAD CODE,
    // guaranteed deadlock 
    return Task.Run(() => { 
        var request = HttpClient.GetAsync(url).Result; 
        var download = request.Content.ReadAsStringAsync().Result; 
        return download; 
    }).Result; 
}

Well code above is a bit of an exaggeration, just to prove a point. It’s the worst possible thing that you can do. The code above will deadlock no matter what context you are calling from because it schedules tasks for the threadpool and then it blocks the threadpool thread. If called enough times in parallel, it will exhaust the threadpool, and your application will hang… indefinitely. In which case the best thing you can do is a memory dump and restart the application.

What Causes a Deadlock?

Task.Wait() does. That would be the end of story but sometimes it cannot be avoided, and it’s not the only case. Deadlock might also be cause by other sort of blocking code, waiting for semaphore, acquiring as lock. The advice in general is simple. Don’t block in async code. If possible this is the solution. There are many cases where this is not possible to do and that’s where most problems come from.

Here is an example from our own codebase.

Yes! This causes a deadlock!

public String GetSqlConnString(RubrikkUser user, RubrikkDb db) 
{ 
    // deadlock if called from threadpool, 
    // works fine on UI thread, works fine from console main 
    return Task.Run(() => 
        GetSqlConnStringAsync(user, db)).Result; 
}

Look at the code above. Try to understand it. Try to guess the intent, the reason why it’s written like this. Try to guess how the code could fail. It doesn’t matter who wrote it, anyone could have written this. I wrote code like this that’s how I know it deadlocks.

The problem the developer is facing that the API they are supposed to call is async only, but the function they are implementing is sync. The problem can be avoided altogether by making the method async as well. Problem solved.

But, it turns out that you need to implement a sync interface and you are supposed to implement using API which has async only functions.

The execution is wrapped inside a Task.Run, this will schedule the task on the threadpool the block the calling thread. This is okay, as long as the calling thread is not a threadpool thread. If the calling thread is from the threadpool then the following disaster happens: A new task is queued to the end of the queue, and the threadpool thread which would eventually execute the Task is blocked until the Task is executed.

Okay so we don’t wrap in inside a Task.Run, we get the following version:

This still causes a deadlock!

public String GetSqlConnString(RubrikkUser user, RubrikkDb db) 
{ 
    // deadlock from UI thread, deadlock if called from threadpool, 
    // works fine from console main 
    return GetSqlConnStringAsync(user, db).Result; 
}

Well it got rid of an extra layer of task, which is good and the task is scheduled for the current context. What does this mean? This means that the code will deadlock if threadpool is already exhaused or instantly deadlock if called from UI thread, so it solves nothing. At the root of the problem is the .Result property.

So at this point you might think, is there a solution for this? The answer is complicated. In library code there is no easy solution as you cannot assume under what context your code is called. The best solution is to only call async code from async code, blocking sync APIs from sync methods, don’t mix them. The application layer on top has knowledge of the context it’s running in and can choose the appropriate solution. If called from a UI thread it can schedule the async task for the threadpool and block the UI thread. If called from threadpool then you might need to open additional threads to make sure that there is something to finish. But if you include transition like this from sync to async code inside a library, then the calling code won’t be able to do control the execution and your library will fail in with some applications or frameworks.

Library code should be written without any assumption of synchronization context or framework which calls from. If you need to support both blocking sync and async interface, then you must implement the function twice, for both versions. Don’t even think about calling them from each other for code reuse. You have 2 options, either make your function blocking sync, and use blocking sync APIs to implement it, or make your function async and use async APIs to implement it. In case you need both you can and should implement both separately. I recommend ditching blocking sync entirely and just using async.

Other solutions include writing your own TaskScheduler or SyncronizationContext, so that you have control over the execution of tasks. There are plenty of articles on this, if you have free time, give it a try, it’s a good exercise and you’ll gain deeper insight than any article can provide.

SyncronizationContext? TaskScheduler?

These control how your tasks are executed. These will determine what you can do and can not do when calling async functions. All that async functions do is to schedule a Task for the current context. The TaskScheduler may schedule the execution in any way it pleases. You can implement your own TaskScheduler and do whatever you want with it. You can implement your own SyncronizationContext as well and schedule from there.

The SyncronizationContext is a generic way of queuing work for other threads. The TaskScheduler is an abstraction over this which handles the scheduling and execution of Tasks.

When you create a task by default C# will use TaskScheduler.Current to enqueue the new task. This will use the TaskScheduler of the current task, but if there is no such thing then checks if there is a synchronization context associated with the current thread and uses that to schedule execution of tasks using SynchronizationContext.Post, but if there is no such thing then it will use the TaskScheduler.Default which will schedule work in a queue that gets executed using the thread pool.

Those are a lot of complicated things to consider at the same time, so let’s break it down into several common cases:

In console applications by default you don’t have a synchronization context, but you have a main thread. Tasks will be queued using the default TaskScheduler and will be executed on the thread pool. You can freely block your main thread it will just stop executing.
If you create a custom thread, by default you dont have a syncronization context, it’s just like having a console application. Tasks get executed on the thread pool and you can block your custom thread.
If you are in a thread pool thread, then all following tasks are also executed on the thread pool thread, but if you have blocking code here then the threadpool will run out of threads, and you will deadlock.
If you are in a desktop UI thread, you have a synchronization context, and by default tasks are queued for execution on the UI thread. Queued tasks are executed one by one. If you block the UI thread there is nothing left to execute tasks and you have a deadlock.
If you’re writing a dotnet core web application, you’re basically running everything on the thread pool. Any blocking code will block the thread pool and any .Result will lead to a deadlock.
If you’re writing a ASP.NET web application, then you have theads managed by IIS which will allocate one for each request. Each of these threads has its own syncronization context. Tasks get scheduled for these threads by default. You need to manually schedule for the threadpool for parallel execution. If you call .Result on a task which is enqueued for the request thread, you will instantly deadlock.
If you’re writing a library, you have no idea what code is calling your code, and mixing async code with sync code, or calling .Result will almost certainly make an application deadlock. Never mix async and sync code in a library.

How to Write Good Async Code?

Until now we talked about good cases, bad cases and cases that work in some cases. But what about some practices to follow? It depends. It’s not easy to enforce common practices because how async/await works depends on the context. But these should be followed in library code.

Only call async code only from async code. (dont mix sync with async)
Never block in async code. (never .Result, never lock)
If you need a lock, use SemaphoreSlim.WaitAsync()
Use async/await when dealing with Tasks, instead of ContinueWith/Unwrap, it makes the code cleaner.
It’s okay to provide both sync and async version of API, but never call one from the other. (this is one of the rare cases when code duplication is acceptable)

Understanding all the concepts that relate to async can take some time. Until you do that, here is a cheat sheet that gives you what you can do and cannot do in each context. This is not a comprehensive list and that the deadlock categorization is more towards a strict side which means that you it might still work in some cases but will deadlock in production. There can be other types of blocking code like Thread.Sleep or Semaphore.WaitOne but these will not cause a deadlock on it’s own, but will increase chance of deadlocking if there is a .Result somewhere.

Debugging Methodology

You have a deadlock in your code? Great! The important part is to identify it. It can be from any Task.Result or Task.Wait or possibly other blocking code. It’s like searching for a needle in a haystack.

Memory Dumps Help a Lot!

If you find your application in a deadlocked state, take a memory dump of your application. Azure has tools for this on portal, if not there are plenty of guides for this. This will capture the state of your application. DebugDiag 2 Analysis can automatically analyze the memory dump. You need to the stack trace on the threads to see where the code is blocked. Upon code review you will find a statement there which blocks the current thread. You need to remove the blocking statement to fix the deadlock.

Reproducing the Deadlock

The other approach is to reproduce the deadlock. The method you can use here is stress testing, launch many threads in parallel and see if the application survives. However this might not be able to reproduce problems, especially if the async tasks complete fast enough. A better approach is to limit the concurrency of the thread pool, when the application starts to 1. This means that if you have any bad async code where a threadpool thread would block then it definitely will block. This second approach of limiting concurrency is also better for performance. Visual Studio is really slow if there are a lot of threads or tasks in you application.

Some Corrections:

As it has been pointed out, my examples don’t work they are supposed to, this is because I have simplified them too much in order to get my ideas across more easily. (I hope it worked!)

When you call an async method which is simple enough, it might work even if its wrongly used. Also, in general the classic .NET framework is more forgiving due to it having dedicated threads that you can block. You will experience this when porting from the forgiving .NET framework to the more harsh .NET Core if the original project has badly written async code that “worked at the time”.

Some of the examples rely on having a high load, this is something that’s hard to test and usually happens when it’s too late: in production.

In one of the comment I’ve added a pull request to properly reproduce some of the issues in the example by creating an async method which is a bit more complex than the one in my example:

https://bitbucket.org/postnik0707/async_await/pull-requests/1

See the comments, read the related articles, and test everything for yourself that you don’t believe, if possible, simulate high load and limit thread count in thread pool to reproduce my results more easily.