Implementation Details of Goroutine

9 min readMar 24, 2019

This blog is a rough and partial translation of zkweb’s blog about Goroutine.

The most outstanding character of Golang is probably goroutine. Goroutine relieves programmers from the nightmares of callbacks by making asynchronous programming easy.

Although there are more and more languages that introduced the concept of coroutines, goroutine is still the utmost implementation.
This blog aims to make clear the implementation details of goroutine by analyzing Golang source code.

The analysis done was based on Go 1.9.2, not applicable to other versions and gccgo.

The running environment is on Ubuntu 16.04 LTS 64bit.

Key Concepts

In order to understand goroutine’s implementation, we first need to figure out the 3 important concepts of goroutine, G, M, and P. People who have not read Golang’s source code are probably unfamiliar with these three concepts which are ubiquitous in Golang’s source code. G’s, M’s and P’s discussed are instances of their respective structs.

G (goroutine)

G implies goroutine. Goroutines can be interpreted as managed lightweight threads. In code, a goroutine is created using the statement “go.”

For instance, func main() { go other() } creates two goroutines, the first one being main and the second one other.

New, Sleep, Resume, and Stop of goroutine are managed by Golang runtime.

A goroutine which is executing an asynchronous operation will enter Sleep status and resume after the operation is finished, without occupying a thread on the operating system level.

Goroutines in New, or Resume state are added to the running queue, waiting to be retrieved and run by M.

M (machine)

M represents “machine”. In the current version of Golang, M means more or less system thread. M can run two types of codes, the first one being Golang code which requires a P for M to run and the second one being native code like blocking system calls that does not require P for M to run.

M’s fetch G’s from the running queue and run them. If a G is finished or enters sleeping mode, the corresponding M fetches another one and run it, on and on.

Sometimes G’s have some inevitable blocking native code to call and when this happens, M’s release their P’s and enter blocking mode, with other M’s picking up these P’s and continuing to run G’s in the runnable queue.

Golang runtime needs to ensure there are enough M’s to run G’s while not too many in order to keep CPUs busy.

P (logical processors)

P is short for logical processors. A P can be viewed as the resource an M needs to run G’s.

Though the default number of P’s is equal to the number of CPU cores (hyperthreading counts as cores in a physical core, I suppose) available, programmers can modify the number of P by modifying the environment variable GOMAXPROC.

P can also be seen as a mechanism to control the level of concurrency of Golang code.

If the number of P’s is 1, it means that there is at most 1 M running the Golang code.

The number of threads which are in charge of running native code on the operating system kernel level is not confined by the number of P’s.

Due to the fact that at any time only one M can have a specific P, data in P’s are lock-free, which implies high efficiency of read-write to them.

Data Structures

Before we dive into the workflow of goroutines, there is some goroutine’s internal data structure orientation to do.

Statuses of G’s

idle (_Gidle): g is just created, yet initialized
runnable (_Grunnable): g is in the runnable queue, waiting for M to retrieve and run itself
running (_Grunning): an M (this one definitely has a P) is currently running this G
syscall (_Gsyscall): g in the process of a system call and its M does not possess any P
waiting (_Gwaiting): g is waiting for some requirements to be met and g is neither running or in the runnable queue and possibly in a waiting queue of a channel
dead (_Gdead): g is not being used and possibly done, waiting to be reused in the freelist)
copystack (_Gcopystack): g is acquiring a new stack space and moving the original stack over. This is to prevent Garbage collection

Statuses of M’s

M’s does not have statuses like G’s and P’s do. However, it is okay to see M’s in one of the following statuses.

spinning: (1) an idle M with an associated P spins looking for new G’s, (2) an M w/o an associated P spins waiting for available P’s
in the process of executing Golang code: an M which has a P is executing Golang code
in the process of executing native code: an M which does not have a P is executing native code like blocking system calls
sleeping: an M will enter sleeping mode if there is no runnable G’s, adding itself to the idle M queue. This M does not have a P

Spinning is a crucial mode because it is up to the number of spinning M’s to determine whether to wake up or create/new M’s.

Statuses of P’s

idle (_Pidle): When there is no G’s to run, M’s enter sleeping mode, all of the P’s pertaining to M’s idle and added to the idle P queue
running (_Prunning): After an M acquired a P, the status of P becomes running, the M utilizing resources from the P to run a G
syscall (_Psyscall): This is when Golang runtime executes native code and native code, in turn, calls Golang code
gcstop(_Pgcstop): When gc stops the world (STW)
dead(_Pdead): When the number of P’s is decreased during runtime

Local Runnable Queue

In Golang runtime, there are multiple runnable queues available to preserve runnable (_Grunnable) G’s. They are the local runnable queues in each P’s and the global runnable queue. New g’s will attempt to join the local runnable queue of its P first. M’s will try to fetch G’s from its acquired P’s local runnable queue first, too.

No thread locks are needed in terms of dequeue and enqueue of a local runnable queue.

The length of a local runnable queue is limited. New g will enqueue to the global runnable queue when there are 256 g’s in its local runnable queues. The data structure of local runnable queues is a circular buffer comprising of an array of length 256 and two indexes named head and tail.

When an M retrieves a G from its local runnable queue and it happens to be empty, half of the G’s of an “unlucky” randomly picked P will be given to the P pertaining to the M by the work-stealing algorithm.

Global Runnable Queue

Thread locks are required to dequeue and enqueue the global runnable queue is stored in the global variable sched.

The data structure is a linked list with two pointers named head and tail.

Idle M Linked List

If there are no runnable G’s, M’s will go to sleep and add itself to the idle M linked list which is kept in the global variable sched.

M’s in that list wait for a semaphore m.park which is used to wake up M’s.

Golang runtime ensures there are enough M’s to run G’s. The mechanism is as follows.

An M is woken up or created when there is no spinning M’s and there are idle P’s after enqueueing a G.
Another M is woken up or created if there is no spinning M’s and there are idle P’s when an M left spinning mode and prepares to run a G just dequeued.
When an M left spinning mode and prepares to sleep, it checks again all the runnable queues and if there are G’s in those queues the M enters spinning mode.

Idle P Linked List

When all the G’s in the local runnable queue of a P is finished and there is no G’s to fetch elsewhere, the M will release its P and sleep. The P will become idle and be added to the idle P linked list which is saved in the global variable sched. Next time when there are no spinning M’s and there are P’s available, an M will be woken up or created which will acquire a P which will change to running mode to run a newly enqueued G.

Workflow Overview

The following graph is an illustration of the possible inner working. In the graph, there are 4 P’s and M1, M2, and M3 are currently running G’s. After the current running G’s finish, M’s start to fetch G’s from the runnable queues of their acquired P’s.

It is hard to see the real inner working just by this one graph, so I am gonna talk more about this by working through a code example.

A G which points to main (runtime.main rather than main.main, explained later) is created once the executable file starts to run. The dash lines in the following graph refer to the address where G is about to run.

M1 fetches the G from the runnable queue and runs it.

main creates a new channel and starts two new G’s.

Next, the G (main) tries to receive data from the channel. Obviously, there is no data available at this moment. So the G preserves its status and switches to _Gwaiting status and finally enqueue itself to the channel it just created.

The G (main) will pick up where it left off which is “_= <- c” when dequeued.
M1 fetches G (printNumber) from the runnable queue and runs it.

G (printNumber) prints numbers as expected and when it sends data to the channel it happens to find that there is a G (main) waiting for data from the channel. G (main) is dequeued from the channel, changed to _Grunnable status, and put back to the (local) runnable queue.

Next, M1 runs the second G (printNumber). Since the channel is buffered channel with size 3, data can be sent to the buffer without blocking.

Then, the second G (printNumber) is done, with only G (main) in the queue.

Finally, M1 retrieves and runs G (main) and picks up execution.

The first “_ = <- c” runs successfully.
Because there is buffered data “0” in the channel, the second one runs smoothly without blocking.

G (main) finally comes to an end and program exits.

Some people might wonder what happens when one more “_ = <- c” is present. Well, due to the fact that at the time the third one runs all the G’s are in waiting status, Golang runtime will detect this error and print the following deadlock error message.

fatal error: all goroutines are asleep - deadlock!