One of the main reasons that the Go Language has gained incredible popularity in the past few years is the simplicity it deals with concurrency with its lightweight goroutines and channels.
Concurrency is not something necessarily new, it has existed since long ago in the form of threads which are widely used in almost all applications now a days.
But first, before actually understanding what are goroutines, and no, they are not lightweight threads (although goroutines rely upon thread to run), we are going to dig into how actual threads work in the OS.
What are Threads?
A thread is the smallest unit of processing that can be performed in an OS. In most modern operating systems, a thread exists within a process — that is, a single process may contain multiple threads.
A good example is a web server.
A webserver normally is designed to handle multiple requests at once. And these requests normally are independent from each other.
So a thread can be created, or taken from a thread pool, and requests can be delegated, to achieve concurrency. But remember from the famous Rob Pike talk, "Concurrency is not Parallelism".
But is a thread lighter then a process? Let's see.
Depends on how you look at it.
In theory, a thread shares memory with another thread, and don't actually have to create a new virtual memory space when they are created, not requiring a MMU (memory management unit) context switch. And plus, the communication is simpler then processes, mainly because they can have shared memory while processes require various modes of IPC (Inter-Process Communications) like semaphores, message queues, pipes and etc.
So does that always make threads more performant then processes? Not in this multi-processor world we live in.
e.g. Linux doesn’t differentiate between threads and processes and both are called tasks. Each task can have a minimum to maximum level of sharing when cloned.
When you call fork(), a new task is created with no shared file descriptors, PIDs and memory space. When you call pthread_create(), a new task is created with all of the above shared.
Linux developers have tried to minimise the cost between task switch and have succeeded at it. Creating a new task is still a bigger overhead than a new thread but switching is not.
So, where can threads be improved?
There are three things which make threads slow:
- Threads have a large stack size (≥ 1MB) therefore consume a lot of memory. So imagine creating 1000s of thread means you already need 1GB of memory. That is a lot!
- Threads need to restore a lot of registers, some of which include AVX( Advanced vector extension), SSE (Streaming SIMD Ext.), Floating Point registers, Program Counter (PC), Stack Pointer (SP) which hurts the application performance.
- Threads setup and teardown requires call to OS for resources (such as memory) which is slow. NOT GOOD!
What about Goroutines?
Goroutines are the way of doing tasks concurrently in golang. They exist only in the virtual space of the Go runtime and not the OS, therefore the Go Runtime scheduler is needed to manage their lifecycles. It's important to keep in mind that all the OS sees is a single user level process requesting and running multiple threads. The goroutines itself are managed by the Go Runtime Scheduler.
Go Runtime maintains three C structs for this purpose:
- The G Struct : Represents a single goroutine and contains the fields necessary to keep track of its stack and current status. It also contains references to the code that it is responsible.
- The M Struct : Represents an OS thread. It also contains pointers to fields such as the global queue of runnable goroutines, the current running goroutine, its own cache and the reference to the scheduler
- The Sched Struct : It is a single, global struct that keeps track of the different queues of goroutines and M's and some other information that the scheduler needs in order to run, such as the Global Sched Lock.
There are 2 queues containing G structs, 1 in the runnable queue where M's (threads) can find more work, and the other is a free list of goroutines. There is only one queue pertaining to M's (threads) that the scheduler maintains. And in order to modify these queues, the Global Sched Lock must be held.
So, on startup, go runtime starts a number of goroutines for GC, scheduler and user code. An OS Thread is created to handle these goroutines. These threads can be at most equal to GOMAXPROCS (This is defaulted to 1, but for best performance is usually set to the number of processors on your machine).
Here is the catch! (ok)
To make the stacks small, Go’s run-time uses resizable, bounded stacks, initially of only 2KB/goroutine. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn’t, the run-time grows (and shrinks) the memory for storing the stack automatically, allowing many goroutines to live in a modest amount of memory. The CPU overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number.
Blocking? No problem!
When a goroutine makes a blocking call, such as by calling a blocking system call, the thread running in must block, and the run-time automatically moves other goroutines on the same operating system thread to a different, runnable thread taken from the queue of Scheduler (the Sched Struct) so they won’t be blocked. Therefore, at least one more thread should be created by the runtime to continue the execution of other goroutines that are not in blocking calls. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.
As such, Go routines scale quite well.
But, if you are using channels to communicate, that in Go they exists only in virtual space, the OS does not block the thread. These goroutines simply go in the waiting state and another runnable goroutine (from the M struct) is scheduled in it's place.
Go Runtime Scheduler
The Go Runtime Scheduler keeps track of each goroutine, and will schedule them to run in turn on a pool of threads belonging to a process.
The Go Runtime Scheduler does cooperative scheduling, which means another goroutine will only be scheduled if the current one is blocking or done, and that is easily done via code. Here are some examples:
- Blocking syscalls like file and network operations.
- After being stopped for garbage collection cycle.
This is better than pre-emptive scheduling which uses timely system interrupts (e.g. every 10 ms) to block and schedule a new thread which may lead a task to take longer than needed to finish when number of threads increases or when a higher priority tasks need to be scheduled while a lower priority task is running.
Another advantage is that, since it is invoked implicitly in the code e.g. during sleep or channel wait, the compile only needs to safe/restore the registers which are alive at these points. In Go, this means only 3 registers i.e. PC, SP and DX (Data Registers) being updated during context switch rather than all registers (e.g. AVX, Floating Point, MMX).
If you want to know more about go concurrency you can refer to the links below:
- Concurrency is not parallelism by Rob Pike (Must watch for any Go Developer)
- Analysis of Go runtime Scheduler
Like the post? So let's CLAP IT!