Kotlin coroutines, threads, concurrency and parallelism 101
Kotlin coroutines have been stable since Kotlin 1.3. As a result, we can finally get rid of the experimental flag and start our exciting journey into the uncharted magical world of concurrency.
But wait a second…we can’t just dive in unprepared! Let’s have a short debriefing and go over a few questions:
- What exactly does concurrency mean?
- How is concurrency related to parallelism?
- What about threads? Why should we consider using coroutines if we have threads? What are the benefits?
To answer these questions we need to go deeper…
…to understand how threads work at a very low level inside our CPU.
Threads vs cores
Let’s suppose that we have a quad-core CPU — a CPU which has 4 cores.
Imagine that our CPU is just like a factory where each CPU core corresponds to a worker. Under this scenario, there are four workers, representing the individual cores of the processor. However, as usual, the whole process is controlled by the Boss — the operating system, which gives commands to the workers.
Threads are like sequences of commands given to the CPU cores. Just like threads deliver tasks to the CPU cores, assembly lines deliver products to workers to process in our factory. Therefore, assembly lines represent threads.
Note: Behind every thread, there is a process. However, because Android developers deal with just one process most of the time, let’s skip this term to keep the article simple.
While workers are processing products from the assembly lines, the Boss, our operating system, is also very busy. He is fully included in the process as long as he has to manage all threads and take care of scheduling. And as you know, a busy boss is an expensive pleasure, similar to threads. Both cost a lot, and both require a lot of resources. Each thread on a JVM consumes about 1MB of memory.
Physical vs logical core
A physical core is part of the hardware of the CPU, and it is exactly what it sounds like — it is just physically there. Just a bunch of transistors inside the CPU itself.
On the other hand, a logical core is like a piece of code — it exists in the computer, but it is not bound to any particular hardware. The number of logical cores expresses the number of threads that can be executed at the same time. For example, if we have a CPU with 4 cores and 4 threads, we have 4 physical cores and 4 logical cores. However, if we have a CPU with 4 cores, but we can start 8 threads at the same time, we have 8 logical cores but still just 4 physical ones.
One might ask what is happening if there are more logical cores than physical cores?
In our factory, we can illustrate this situation as a worker responsible for two assembly lines — he simply cannot take care of both of them at the same time.
Let’s say a worker has a request to process products from two assembly lines during the workday.
He begins by processing products from the first line. Suddenly, the line stops working — maybe something got stuck, maybe he needs to wait for more products. Whatever the case, the line is blocked, and our worker can’t continue doing his job until the line is unblocked.
However, at the same time, products from the second line may be ready to process. So instead of chatting with colleagues and waiting until the first line is ready again, our worker switches to the second line and starts processing products from there.
When he finishes his work on the second line, he can check if the first line is up and running again, and if it is, he can switch back to that line and complete his job.
The process is done faster, the Boss is happy, and the worker can go home earlier than normal.
The situation described at the beginning of this article, where all core workers are working at the same time and each of them is responsible for just one assembly line is what we call a parallel operation.
But as we have seen, a core by itself can not work on multiple threads at the same time. If there are more threads to process, the core has to switch between them.
The switching operation is what we call a concurrent operation. Although it pretends to be a parallel operation, in reality tasks are not executed at the same time; they are executed concurrently.
And speaking about concurrency, let’s quickly go over coroutines!
To cut a long story short, coroutines are like threads executing work concurrently. However, coroutines are not necessarily associated with any particular thread. A coroutine can initiate its execution on one thread, then suspend and continue its execution on a different thread. While a coroutine is suspended, it doesn’t block the thread it was running on. When a coroutine reaches a suspension point, the thread is returned back to its pool, so it can be used by another coroutine or by another process. When the suspension is over, the coroutine resumes on a free thread in the pool.
Kotlin coroutines are not managed by the operating system; they are a language feature. The operating system doesn’t have to worry about coroutines or planning them. Coroutines handle all these tasks by themselves using cooperative multitasking. At the moment when a coroutine suspends, the Kotlin runtime finds another coroutine to resume its execution. That means our Boss luckily doesn’t have to manage the workflow by himself. Just imagine a scenario where much of the work that the Boss had to do before is taken over by a hired supervisor for much less money. In the same vein, coroutines, unlike threads, also don’t need a lot of memory, just some bytes. Because of this, you can start many more coroutines than threads. This characteristic of coroutines allows us to reach a very high level of concurrency at little additional cost.
Show me the code!
However, some say that theory without practice is idle, and practice without theory is blind. So, let’s breathe life into our coroutine factory and code a simple program to demonstrate the difference between how threads and coroutines work.
Our program calls two functions. Each function:
- prints a message saying which thread it is running on,
- stops its execution for 1 second,
- prints a message saying which thread it is running on.
All commands of the program are executed sequentially since
Thread.sleep() is a blocking call. The first function starts on the main thread, then the thread is blocked for 500 milliseconds. Once the function is finished, the thread will be available for the second function to start.
Now let’s rewrite this example using coroutines. Instead of calling
Thread.sleep(), we call a suspending function
delay(), and we start each function from a separated coroutine.
When you run this code, you might get the impression that both functions run in parallel, but how can they run on the main thread at the same time?
delay() is a suspending function, calling it from the first function results in non-blocking suspension, and the thread is released to perform another task, which in our case means executing the second function. When
delay() is finished, it continues the execution of the first function from the point it left off. The switching operation is executed. That’s the power of concurrency.