Go: Goroutine, OS Thread and CPU Management

Vincent Blanchon
Nov 20, 2019 · 5 min read
Image for post
Image for post
Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ This article is based on Go 1.13.

Creating an OS Thread or switching from one to another can be costly for your programs in terms of memory and performance. Go aims to get advantages as much as possible from the cores. It has been designed with concurrency in mind from the beginning.

M, P, G orchestration

To solve this problem, Go has its own scheduler to distribute goroutines over the threads. This scheduler defines three main concepts, as explained in the code itself:

Here is a diagram of this P, M, G model:

Image for post
Image for post
P, M, G diagram

Each goroutine (G) runs on an OS thread (M) that is assigned to a logical CPU (P). Let’s take a simple example to see how Go manages them:

Go will first create the different P based on the number of logical CPUs of the machine and store them in a list of idle P:

Image for post
Image for post
P initialization

Then, the new goroutine or goroutines ready to run will wake a P up to distribute the work better. This P will create an M with the associated OS thread:

Image for post
Image for post
OS thread creation

However, like a P, a M with no work — i.e. no goroutine waiting to run — returning from a syscall, or even forced to be stopped by the garbage collector, goes to an idle list:

Image for post
Image for post
M and P idle list

During the bootstrap of the program, Go already creates some OS thread and associated M. For our example, the first goroutine that prints hello will use the main goroutine while the second one will get an M and P from this idle list:

Image for post
Image for post
M and P pulled from the idle list

Now we have the big picture of the goroutines and threads management, let’s see in which case Go would use more M than P and how goroutines are managed in case of system calls.

System calls

Go optimizes the system calls — whatever it is blocking or not — by wrapping them up in the runtime. This wrapper will automatically dissociate the P from the thread M and allow another thread to run on it. Let’s take an example with a file reading:

Here is the workflow when the file is opening:

Image for post
Image for post
Syscall handoffs P

P0 is now in the idle list and potentially available. Then, once the syscall exits, Go applies the following rules until one can be satisfied:

  • try to acquire the exact same P, P0 in our example, and resume the execution
  • try to acquire a P in the idle list and resume the execution
  • put the goroutine in the global queue and put the associated M back to the idle list

However, Go also handles the case when the resource is not ready yet in case of non-blocking I/O such as http call. In this case, the first syscall — that follows the previous workflow — will not succeed since the resource is not yet ready, forcing Go to use the network poller and park the goroutine. Here is an example:

Once the first syscall is done and explicitly says the resource is not yet ready, the goroutine will park until the network poller notifies it that the resource is now ready. In this case, the thread M will not be blocked:

Image for post
Image for post
Network poller waiting for the resource

The goroutine will run again when the Go scheduler looks for work. The scheduler will then ask the network poller if a goroutine is waiting to be run after successfully getting the information it was waiting for:

Image for post
Image for post

If more than one goroutine is ready, the extra ones will go on the global runnable queue and will be scheduled later.

Restriction in term of OS threads

When system calls are used, Go does not limit the number of OS threads that can be blocked, as explained in code:

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package’s GOMAXPROCS function queries and changes the limit.

Here is an example of this situation:

Here are the number of threads created from the tracing tools:

Image for post
Image for post

Since Go optimizes the thread usage, it can be re-used while its goroutine is blocking, it explains why this number does not match with the number of the loop.

A Journey With Go

A Journey With Go Language Programming

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store