Blocking threads, suspending coroutines

Modern operating systems support multiple threads in each process. Thread is an abstraction that gives an illusion of a separate CPU core that executes your code, but you can start as many threads as you need, regardless of the number of physical cores your CPU actually has. So with multiple threads you should not be worried that some of your threads are blocked, should you? Not so simple. There are at least two reasons to care.

Traffic congestion image from wikipedia, CC BY-SA 3.0

For one, if you are writing a UI application, there is usually the single main thread that handles all the UI interactions and events. Blocking this thread makes the whole application unresponsive. In backend applications this is rarely an issue, but, on the other hand, backend applications tend to handle lots of concurrent requests which are typically scheduled to execute in a thread pool of some fixed size. All is fine while requests execute quickly, but in the modern world of service-oriented architectures just one slow service can end up blocking all the threads, stalling progress in the whole system, if you block a caller thread while waiting for an answer from that service.

Blocking threads

How can a thread be blocked? There are two different ways to block a thread. One is to run a CPU-intensive computation that takes a lot of time (aka CPU-bound task). For example, the following (non-secure) function that generates 4096-bit prime number takes around 10 seconds to execute on my machine

fun findBigPrime(): BigInteger = 
BigInteger.probablePrime(4096, Random())

The other way to block a thread is by using blocking IO (aka IO-bound task) to wait, for example, for a message from some remote system that may take lot of time to arrive:

fun BufferedReader.readMessage(): Message? =
readLine()?.parseMessage()

In the first case the CPU resources are actually consumed, and in the second case IO operation can wait for a lot of time without actually consuming CPU resources. Still, we say that the thread is blocked in both cases, because the thread calling those functions cannot do anything else — it cannot process UI events, it cannot execute other requests.

Threads are expensive, so blocking a thread is something that should be avoided. If you have to perform a CPU-bound task, then you have no choice but to block some thread, but you always have a choice of what thread to block. You should avoid blocking your main UI thread or limited request-processing threads in your backend application.

On the other hand, if you block because of IO, then you can usually choose to completely avoid blocking by using non-blocking (aka asynchronous) IO libraries that do not block threads at all.

Suspending coroutines

Coroutines provide an alternative to thread blocking by supporting suspension. So, what is the difference between blocking a thread and suspending a coroutine? Let us take a look at the following snippet of sequential code:

val data = awaitData() // does it block or suspend?
processData(data)

In this snippet processData is called only after awaitData returns. From the standpoint of a call to processData it does not really matter if awaitData blocks or suspends for a long time. So, why do we have to make a distinction between blocking and suspension? Cannot we simply say that awaitData blocks? Not really, because there is an important difference. If this piece of code executes in the main thread of a UI application, then blocking a thread by awaitData leads to frozen UI, but suspending a coroutine does not. Thus, we need some way to distinguish blocking functions from non-blocking ones.

Recognizing blocking code

Java APIs on JVM platform are usually explicit about their blocking behavior in their documentation. In the core APIs the right term is usually used, making blocking behavior easy to recognize. For example, when you examine the documentation for InputStream.read method you see the following:

This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

Sometimes it is less explicit. For example, documentation on ReentrantLock.lock method says:

If the lock is held by another thread then the current thread becomes disabled for thread scheduling purposes and lies dormant until the lock has been acquired

Which is an intricate way of saying that the current thread is blocked until the lock has been acquired.

Unfortunately, in other cases a deeper knowledge or experience is required. For example, there is nothing in BigInteger.probablePrime documentation to hint on the fact that it is quite a CPU-consuming method.

Note: JVM documentation generally follows a narrow definition of “blocking”, trying to limit usages of this term for blocking IO. Beware, though, that Thread.State enumeration defines BLOCKING in a different way and methods doing blocking IO are considered to have RUNNABLE state. For this story I adopt wide definition of blocking, which is pragmatic and provides a useful mental model for applications with critical threads (that should not be blocked).

In complex applications it may not be trivial to recognize blocking behavior of high-level methods at all. There are no common naming conventions nor annotations to distinguish blocking and non-blocking methods. Finding calls of blocking methods is hard. Developers resort to combinations of lint checks and runtime checks to find inappropriate blocking calls. For example, Android developers enjoy the fact that blocking network IO on Android throws NetworkOnMainThreadException.

Suspending functions

Kotlin programming language introduces a concept of suspending functions via suspend modifier. One mistake that is often made is that adding a suspend modifier to a function makes it either asynchronous or non-blocking. You can even notice this mistake in the talk “Exploring Coroutines in Kotlin” by Venkat Subramariam from KotlinConf 2018. Let us examine this mistake closer by adding suspend modifier to our first example of blocking function:

suspend fun findBigPrime(): BigInteger = 
BigInteger.probablePrime(4096, Random())

Even with this change findBigPrime function still blocks the caller thread for quite a long time. In a UI application we can launch a coroutine in the main thread, call findBigPrime, and get quite a nasty UI freeze. Actually, if you write this function in IntelliJ IDEA, then you get “redundant ‘suspend’ modifier” warning, hinting that suspend modifier, by itself, does not magically turn blocking functions into non-blocking ones.

This makes some people wonder as to why we recommend launching coroutines in the main thread by default? If you watch an excellent talk “Coroutines by Example” by Christina Lee from Droidcon London 2018 you cannot help but notice her wondering about that, too. Wonder no more. The answer is close.

Suspending convention

Suspending functions add a new dimension to design of code. It was blocking/non-blocking without coroutines and now there is also suspending/non-suspending on top of that. To make everybody’s life simpler we use the following convention: suspending functions do not block the caller thread.

The means to implement this convention are provided by withContext function. For example, the proper way to turn findBigPrime function into a suspending one is:

suspend fun findBigPrime(): BigInteger =
withContext(Dispatchers.Default) {
BigInteger.probablePrime(4096, Random())
}

Now you can call findBigPrime from the coroutine launched in the main thread of your UI application without blocking its main thread!

Another convention that is at play here is that we use the Default dispatcher to execute CPU-bound code. The default dispatcher is optimized for such CPU-bound functions as it is backed by a thread-pool with as many threads as there are CPU cores in the system, making sure that CPU-bound code can saturate all physical resources as needed. However, it does not over-allocate threads, since that would not help to execute CPU-bound tasks faster, but only waste memory.

Blocking IO to suspending

Now, let us take a look at the second example of a blocking function — an IO-bound one. We turn it into a suspending function in a similar way:

suspend fun BufferedReader.readMessage(): Message? =
withContext(Dispatchers.IO) {
readLine()?.parseMessage()
}

You shall notice one important difference here — IO dispatcher is used. The reason not to use the default dispatcher here boils to the difference between CPU-bound and IO-bound code.

IO-bound code does not actually consume CPU resources, so if we use the default dispatcher we may end up with a situation when, for example, on an 8-core machine with 8 threads allocated to the default dispatcher, all of the threads are blocked on IO, but they do not actually consume CPU, so our 8-core machine is underutilized. IO dispatcher allocates additional threads on top of the ones allocated to the default dispatcher, so we can do blocking IO and fully utilize machine’s CPU resources at the same time.

Conclusion

Using withContext is not the only way to get you suspending functions that do not block. Another way is to use truly asynchronous (non-blocking) library functions to start with. For example, instead of turning a blocking Thread.sleep method into a suspending function via withContext, you should simply use a suspending delay function.

I will not elaborate further on asynchronous functions (and asynchronous IO) in this (already long) story. It suffices to say that you also get suspending functions that do not block.

So, if you follow a convention where all your suspend functions do not block, then it is perfectly safe to launch all your coroutines in the main thread of your application. This is great, since this way you can safely access and modify UI of your application from such coroutines with one less thing to worry about. Moreover, this convention reduces the need to refer to documentation to see if a function you are calling is blocking. Once you’ve isolated and encapsulated blocking code used by your application into suspending functions, you can call them at will from anywhere without having to double-check whether they are blocking or not.

The same is true for any backend — this convention lets you safely run most of your code in non-blocking, limited thread pools of your server, increasing your server’s throughput by avoiding context switches. It would only perform an expensive switch to a different thread (via withContext) when you have to perform an expensive blocking operation anyway.