The Anatomy of Golang Goroutines

Exploring the Core Mechanics of Golang Goroutines: A Deep Dive into Their Inner Workings

Published in

CodeX

7 min readJun 24, 2024

Concurrency is a fundamental aspect of modern programming, and Golang (Go) brings a unique approach to it through the use of goroutines. In this article, we will explore the anatomy of Golang goroutines, delving into how they operate in comparison to traditional kernel-level and user-level threads. We’ll begin by understanding the basic concepts of processes and threads, followed by a detailed look at the drawbacks of pure kernel-level and user-level threading models. Then, we will discuss how Golang leverages a hybrid model to achieve efficient concurrency, utilizing mechanisms like local and global run queues, and the sophisticated Go-scheduler to optimize performance. Finally, we will examine the control flow in a Golang application, highlighting the distinctive behavior of goroutines compared to threads in other languages like Java. This comprehensive overview will equip you with a deeper understanding of how goroutines function and why they are a powerful feature in Golang.

Process

A process in UNIX is an instance of a running program, can be created using the fork() system call. This system call generates a child process with its own stack space, registers, and program counter. For better efficiency, UNIX employs Copy On Write (COW), allowing the child process to share the same memory space as the parent. If the child process writes to the shared memory, the affected page is copied to a new location, thus creating its own separate memory space.

Threads

Creating and deleting a process is costly; therefore, we need to spawn many threads for a single process to control the processes at a fine-grained level.

There can be multiple threads for a single process. Two sibling threads(threads created in the same process) have their own private stack space, and program counter but, they share the same memory space. Therefore, we must have implemented proper synchronization between the threads to complete the job without data inconsistencies(due to data races) and deadlocks.

Kernel-level and User-level threads

User-level threads have their memory space as part of the application. It makes them faster in switching threads because the kernel still finds this is as a single kernel-level thread therefore no need to switch the context in the processor. But the downside in user-level threads is that when the user-level thread blocks due to a blocking operation since the underline kernel doesn’t know about the user-level threads it will keep the kernel-level thread also blocked without trying to switch the kernel-level thread.

Another downside of user-level threads is that let us say that 3 user-level threads are running in a single kernel-level thread but, the kernel sees only a single kernel-level thread, if there are multiple cores then there will be no chance for the user-level threads to run parallelly because all the 3 user-level threads are running using a single kernel-level thread and kernel won’t be able to schedule them in different cores.

Goroutines use a hybrid thread model

Because of the drawbacks in the pure kernel-level and pure user-level models, the Go runtime uses a hybrid model by mixing both models. Go runtime creates additional queues to manage the goroutines and schedule them appropriately. Go will schedule the queues by inserting the upcoming goroutines to be run and then use a single kernel-level thread per single queue so that based on the order how the goroutines get filled to the queues, they get the chance to run using the kernel-level thread (order is only a single factor to consider while getting executed other properties may also affect the actual order of execution).

By now you may think of like isn’t the goroutines are kind of like similar to the user-level threads? Yes, but we saw that the user-level threads are unable to inform the kernel-level thread when there is a blocking operation going on. In gorourines, the go runtime can do that. It will wrap the goroutine when there is a blocking operation that is about to execute so that the scheduler will take the necessary steps to stop blocking the kernel-level thread.

Go runtime creates several kernel-level threads when the process starts. The number of kernel-level threads is directly proportional to the GOMAXPROCS environment variable which is by default the number of cores in your machine. If you want you can alter it based on your needs.

Local Run Queue (LRQ), Global Run Queue (GRQ)

There are two types of queues namely LRQ and GRQ. LRQ has the already scheduled goroutines. while the GRQ has the unscheduled goroutines. Go-runtime manages these queues so that in a multicore system there is a higher chance of getting the true parallelism if those goroutines truly can be run in parallel. When the LRQs are getting emptied, they get filled by popping goroutines from the GRQ.

When a blocking operation is about to run in a goroutine, go runtime will wrap that in the goroutine to signal the kernel-level thread that “now there will be a blocking operation in this kernel thread” and after initiating the blocking operation, without waiting for any more the kernel will de-schedule the kernel-level thread(so that it won’t block the processor anymore), then right before de-scheduling that, Go will spawn a new kernel-level thread or else gets a waiting one from the pool and adds the remaining goroutines in the LRQ to that new kernel-level thread. So that the kernel-level thread won’t get blocked due to the blocking operation.

This system of moving goroutines from one queue to another is known as work stealing. This is done not only because of the blocking operations but, also when there are imbalances in the LRQs (one LRQ is fully loaded but there are other empty ones), go does this so that so that it maintains the efficiency by fully utilizing the multicores effectively.

The one who schedules goroutines in a kernel thread is the Go-scheduler. Before discussing about Go-scheduler we need to know that in the kernel it schedules kernel-level threads using the OS-scheduler. When it schedules another thread by revoking the current thread, it needs to do the context switching(it needs to store the current state of the ongoing thread including “register values, stack pointer, program counter, and thread’s metadata” in the memory and load the state of the new thread in). OS-scheduler decides this by using the clock interrupts. But as discussed the Go-scheduler schedules its goroutines in a kernel thread. Therefore it needs another set of mechanisms to do that.

Let's see how the Go-scheduler schedules goroutines in a single kernel-level thread. It uses user-level events to schedule the goroutines in the kernel thread. Here the user user-level events start new goroutines using the “go keyword”, making system calls like network requests and reading files, and synchronizing goroutines using wait-groups, etc are examples of the events that cause the Go-scheduler to run.

How Go uses the user-level events to schedule the goroutines in a kernel thread

There is a way to call the Go-scheduler explicitly from code using “runtime.Goshed()” but, it does not guarantee that it schedules the next goroutine as soon as we run it. It may scheduled or it may not, that depends on several other runtime factors.

How the control flow behaves in a Golang application

In Golang the main function runs in a goroutine. If we need to run some code blocks in different goroutines then we need to specifically tell that to Golang. When the main goroutine finds a function call with the go keyword, it creates a new goroutine and continues its work toward the end of the main function. If we need the main function to wait for all the explicitly created goroutines to finish then, we need to tell that specifically. Otherwise, the main goroutine will exit and end the process regardless of whether the other goroutines have finished their work or not.

In Java, this is not the case. The main process does not exit until all the explicitly created threads exit.

There are several fascinating concurrency mechanisms in Golang to synchronize the goroutines and to make them effectively communicable. But those are out of the scope of this article. I will post them in upcoming articles for you.

In conclusion, Golang’s approach to concurrency with goroutines offers a powerful and efficient way to manage multiple tasks. By using a hybrid model that combines the best aspects of kernel-level and user-level threads, Go achieves high performance and true parallelism. Understanding the role of the Go-scheduler, local and global run queues, and the unique handling of blocking operations can help you write more efficient and robust Go programs. As you continue to explore Golang, you’ll discover even more fascinating concurrency mechanisms that make this language a strong choice for modern software development. Stay tuned for future articles where we will delve deeper into these topics.