Multithreading and Kotlin

I’ve been wanting to follow up on my previous blog post, “Approaching Kotlin from Three Angles”, with the multithreading chapter and things have been just getting busier every day and I haven’t had much time for it. Then, I had the opportunity to attend KotlinConf which we proudly sponsored and I came out of it with a burst of inspiration. There were many great talks and panels, and the one that stood out for me was “Introduction to Coroutines” by Roman Elizarov which aligned perfectly with the topic I wanted to talk about and I was enthused to get back to writing this.

CPUs of early computers were capable of executing a single operation per tick of their internal clocks. For comparison, the Apollo Guidance Computer, the computer that was installed on Apollo 11 that helped land Neil Armstrong on the moon over 4 decades ago, had a single core CPU with 16,800 transistors clocked at 2MHz and a 4KB RAM. Today, iPhone X features a 6-core A11 Bionic chip with 4 billion transistors clocking at 2.4MHz and a 3GB RAM. This is to say that we have enough computing power in our pockets to control fleet of spaceships, navigate and land them on each planet and their moons in the solar system.

Moreover, all modern processors today are capable of performing multiple tasks simultaneously. According to Geekbench Browser, a benchmark tool that measures a mobile device or computer’s CPU and compute performance, the iPhone 8 scores higher than a 13-inch Macbook Pro with a 7th-generation Core i5 processor on the multicore portion of the tests. Given this capacity and computing power, it wouldn’t make any sense to leave any of the transistors idle while waiting for some operation to complete. That’s the reason modern computer architectures support the ability to execute multiple processes and threads concurrently.

A process is a running instance of a program. A thread is the smallest sequence of instructions that can be managed independently by a scheduler. Each process is divided into one or more threads. Multiple threads are interleaved and receive a small amount of time slice called quantum. When a thread’s quantum is complete, the thread scheduler switches the thread with another one. This is called thread context switch and it happens so quickly that the threads appear to be running in parallel while actually running sequentially. Thread context switching is much more lightweight compared to process context switches because threads share many resources such as the address space which enable them to read from and write to the same data structures and variables as the other threads. Communication between processes, also known as Inter Process Communication (IPC), on the other hand is quite difficult and resource intensive.

There is an immediate challenge, though. Sequential execution is an inherent part of how we as humans think, operate, and understand the world around us. If you want to test how good you are at multitasking, take this “Selective Attention Test” and reach out to me if you think you’re an outlier. It’s fairly well documented at this point that we aren’t as good as we think we are at doing several things at once. What we are good at is is shifting our attention back and forth between multiple things, akin to the thread context switching, suspending and restoring the state of different sequences of instructions on a single core. When we consider this fact that we are not very effective at multitasking, I think it’s one of the reasons why it’s challenging to work on and understand multithreaded code.

Java is one of the earliest languages which introduced concurrency natively. At a low level, the necessary constructs for creating concurrent programs is built into the language. You can either extend the Thread class or after instantiating it, you can pass a Runnable object through its constructor.

// Extending the Thread class to implement threads.
class SimpleThread: Thread() {
public override fun run() {
println("${Thread.currentThread()} has run.")
}
}
// Implementing the Runnable interface to implement threads.
class SimpleRunnable: Runnable {
public override fun run() {
println("${Thread.currentThread()} has run.")
}
}
fun main(args: Array<String>) {
val thread = SimpleThread()
thread.start() // Will output: Thread[Thread-0,5,main] has run.
    val runnable = SimpleRunnable()
val thread1 = Thread(runnable)
thread1.start() // Will output: Thread[Thread-1,5,main] has run
}

Kotlin has an extension function for creating Java Threads using the below API:

fun thread(
start: Boolean = true, // If true, the thread is immediately started.
isDaemon: Boolean = false, // If true, the thread is created as a daemon thread.
contextClassLoader: ClassLoader? = null, // The class loader to use for loading classes and resources in this thread.
name: String? = null, // Name of the thread.
priority: Int = -1, // Priority of the thread.
block: () -> Unit // Block of code to run.
): Thread (source)

Such that a thread can be instantiated simply by:

thread() {
println("${Thread.currentThread()} has run.")
}

It’s tempting to think that spawning more threads can help us execute more tasks concurrently. Unfortunately, that’s not true. Creating too many threads can actually make an application run more slowly. Threads are objects which impose overhead during object allocation and garbage collection. There’s also a non-trivial expense incurred when switching between them.

Those of you who are iOS developers know well that iOS encourages concurrency through Grand Central Dispatch (GCD) where threads are not even mentioned explicitly. It’s very difficult for an app to use multiple cores effectively. When you take into consideration that the same app will potentially run on different computers (think of new smartphones and tablets released every year) with many other apps competing for the same resources, we can’t expect individual developers to make a globally optimal decision. That’s why GCD manages the work submitted to it in dispatch queues which execute the work on a pool of threads fully managed by the system. The developer doesn’t know which thread the tasks are executed and shouldn’t have to care about it.

In order to demonstrate how expensive it is to create, start, and keep threads around, I’ll run a simple test and spawn a million of them:

var counter = 0
val numberOfThreads = 1_000_000
val time = measureTimeMillis {
for (i in 1..numberOfThreads) {
thread() {
counter += 1
}
}
}
println("Created ${numberOfThreads} threads in ${time}ms.")

The above snippet ran on my computer (MacBook Pro, 15-inch, 2016, 2.9GHz Intel Core i7 with 16 GB memory) in 33.9 seconds!

After such a long introduction, we can now talk about Kotlin Coroutines, a new way of writing asynchronous, non-blocking code that can be thought of as light-weight threads. Similar to threads, coroutines can run in parallel or concurrently, wait for and communicate with each other. The biggest difference between them is that creating coroutines is almost free. There’s little in terms of performance overhead. Let’s try the same test using coroutines like the following:

var counter = 0
val numberOfCoroutines = 1_000_000
val time = measureTimeMillis {
for (i in 1..numberOfCoroutines) {
launch {
counter += 1
}
}
}

The above snippet ran on my computer in 424 milliseconds, that’s an ~80x improvement!!! Why is there such a stark difference? The magic here is that coroutines execute on a shared pool of threads and since one thread can run many coroutines, there isn’t a need to spawn a new thread for every block of code that’s being executed.

launch{} in the above example is a library function geared towards working with coroutines. It launches a new coroutine without blocking current thread and returns a reference to the coroutine as a Job object. The coroutine can be cancelled when the resulting job is cancelled. By default, the coroutine is immediately scheduled for execution.

If we wanted to return a value from a coroutine, we would start it by the async{} function which creates a new coroutine and returns its future results as an instance of Deferred<T>. Deferred is a non-blocking cancellable future (A future describes an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is yet incomplete).

To understand the benefits of async{}, let’s look at a simple example where we have a function that counts from 1 to n where n is between 1 billion and 10 billion. Let’s call this function twice in a row so that the second invocation has to wait for the first invocation to finish. Then, let’s wrap each invocation with an async{} and compare the run times. Here’s the code:

fun meaninglessCounter(): Int {
var counter = 0
for (i in 1..10_000_000_000) {
counter += 1
}
    return counter
}
fun main(args: Array<String>) {
// Sequential execution.
var time = measureTimeMillis {
val one = meaninglessCounter()
val two = meaninglessCounter()
println("The answer is ${one + two}")
}
println("Sequential completed in $time ms")

// Concurrent execution.
var time2 = measureTimeMillis {
val one = async { meaninglessCounter() }
val two = async { meaninglessCounter() }
runBlocking {
println("The answer is ${one.await() + two.await()}")
}
}
println("Concurrent completed in $time2 ms\n")
}

As you can see, async{} does provide concurrency and it roughly runs twice as fast as its sequential counterpart which is to show that we’re achieving some level of parallelism as well.

You may have noticed something different in the third print statement above in the main function. It’s wrapped inside a runBlocking{} function. Without it, we would’ve gotten the following compiler error:

await() is what’s called a suspend function. Suspend functions, as the error message suggests, are only allowed to be called from a coroutine or another suspend function. Calls to such a function may suspend at a certain point in time. To start a coroutine, there needs to be at least one suspending function. Suspension is different from blocking in that when a coroutine is suspended, the underlying thread is returned to the thread pool while the coroutine is waiting. When the waiting is done, the coroutine resumes work on an available thread from the thread pool.

runBlocking() is a function that runs a new coroutine and blocks the current thread until its completion. It’s designed to bridge regular blocking code to libraries that are written in suspending style, to be used in main functions and in tests. Being wrapped inside runBlocking(), await() is being executed in a new coroutine and the compiler is happy.

Behind the scenes, suspending functions are implemented via what’s called a Continuation-Passing-Style (CPS). No support from the JVM or the OS is required. Every suspending function has an additional Continuation parameter which is used to compile the function to a state machine where each state corresponds to a suspension point. The following suspending call in Kotlin:

val one = async { meaninglessCounter() }

gets translated to Java with a Continuation parameter like the following:

Deferred one = DeferredKt.async$default((CoroutineContext)null, (CoroutineStart)null, (Function2)(new Test2Kt$main$time2$1$one$1((Continuation)null)), 3, (Object)null);

The code is compiled to an anonymous class that has a method implementing the state machine, a field holding the current state of the state machine, and fields for local variables of the coroutine that are shared between states. Right before a suspension, the next state is stored in a field of a compiler-generated class along with relevant local variables. Upon resumption of that coroutine, local variables are restored and the state machine proceeds from the state right after suspension. To see the rest of the generated Java code, follow this link. For a comprehensive discussion of coroutines implementation details, I highly recommend checking out the official “Guide to kotlinx.coroutines by Example”.

Coroutines are still under experimental status but Roman Elizarov and many others at KotlinConf emphasized that experimental doesn’t mean that coroutines are not production ready. Once the design of the library has been finalized, the experimental status will be lifted and kotlin.coroutines.experimental will move to the kotlin.coroutines package. They will keep the experimental package around for backwards compatibility.

Concurrency is a large and complicated problem. It’s clear to me that in order for developers to easily take advantage of the capacity of modern hardware, languages should have a first-class concurrency model. Kotlin’s approach to using coroutines is a simple and elegant solution to this problem. With it, developers don’t need to worry about how many threads are created or which thread their block of code is running on. The fact that coroutines allow us to write code sequentially by calling suspending functions helps us reason about the code we write in a more natural way as sequential execution is how we as humans think and operate. I hope that this model will enable developers around the world to write more performant apps with less boilerplate code and fewer bugs.