Kernel Scheduling Entity
Before jumping into the threading model. Let’s understand the kernel scheduling entity. For any kernel, the scheduling entity could be a process or thread. Some kernels are not multi-threaded, the scheduling entity will be processed. Most of the recent kernels are multi-threaded. This helped in general for someone implementing threading in userspace vs depending on kernel threading.
Different Thread Models
There are three different threading models we can see Mx1, 1x1, MxN
Mx1 — There are two different possible operating systems, where scheduling entity for kernel is a process or a thread. Either case one would have implemented Mx1 threading model in userspace.
In this implementation, kernel is not aware of any of the user threads. a scheduled user thread maps to kernel thread/process based on the scheduling entity in kernel. So there is only one thread/process in kernel which is serving the user-space scheduler.
The main advantage of this mechanism is that it will not create much load on the kernel.
The main disadvantage with this approach is and depends on the way userspace scheduler is implemented, if a thread within the process blocks since the underlying kernel scheduling entity does not know about other user threads, it will block the entire process. So, even there are some schedulable threads in user space, it cannot schedule them.
Some implementations take care of it by creating another thread and schedule runnable user-space threads. But most of the implementations do not do this.
1x1 — In this mechanism, there is one to one mapping for the user-space threads to kernel threads. POSIX threads implemented in Linux use 1x1 model.
The main advantage of this mechanism is userspace scheduler is simplified since it completely depends on kernel scheduling.
The main disadvantage is that it generates too much load on the kernel. A process could have thousands of threads and causes kernel to create that many threads.
MxN — This is the combination of Mx1 and 1x1. go language incorporates this mechanism. Most of the implementations like HP-UX POSIX threads do implement MxN threading model with the same mechanism.
The main issue with Mx1 model is if one of the thread blocks, the whole process is blocks. Here in MxN model, you will have N scheduling entities in kernel serving M scheduling entities in userspace.
So, at any point in time, we ensure to have N scheduling entities executing in the el. For example, any implementations will set N to the er of processors (or VCPU’s) in the system, so one gets the parallelism.
Go language implemented MxN with three basic primitive entities
G Structure — A G struct represents a single goroutine, stack, current stack, stack gaurd, pointer to code (initial function), parameters, goID
P Structure — Abstraction to the processor in Go runtime. It holds the Context to run Go routine.
M Structure — The M struct is the Go runtime’s representation of an OS thread, global queue of G’s, the G that it is currently running, its own cache, and a handle to the scheduler.
As mentioned earlier N number of threads are present (kernel threads) serving M number of user threads (G Structures Go routines)
P is an abstract layer representing the physical process. At any point in time we ensure set of P’s (equal to GOMAXPROCS) executing in the process.
There a per P run queue, where when a goroutine is created, it will be placed in the run queue of per P, if we reach a certain limit on per P run queue, we place the goroutine into the global run queue (more details in next section)
So, for implementing MxN one needs to tap the blocking scenarios in the process.
- The first one is blocking on a synchronization primitive (like mutex, in Go language Channel)
- Second, blocking in system call
Blocking on a synchronization primitive:
Each synchronization primitive will have a sleep queue associated with it based on what the purpose is. In the case of go lang channels, any thread trying to read, will block if there is no data. A thread trying to write would block if there is no enough buffer left in the channel. Similarly, a thread can block on mutex if the mutex is not available… So, this boils down to having per synchronization sleep queue.
Go language run time maintains a sleep queue for each channel
When a goroutine is blocking, it will be placed in the wait queue. Now the P, will find another goroutine from the run queue and executes. When this goroutine is woken up due to mutex unlock or channel read happened.. then this first goroutine is placed back into P’s run queue.
Blocking in a system call:
Tapping the blocking of a system call can be done in two ways. One is we know which are all blocking system calls (if they are POSIX compliant), otherwise, we do not know which one is blocking system calls and non-blocking system calls.
When a go routine makes a blocking call, the way go run time handles is it leaves the association of the go routine with the M (kernel scheduling entity) and creates or invokes another M thread, since we need to make sure we maintain the concurrency. This is done by disassociating goroutine from P and having association btw G and M. P is ready to serve other G’s in its run queue. Now it does create another M for it to server the G’s.
System Call Entry — Blocked System Calls
- When go routine makes a system call.
- Blocking system calls are intercepted
- If there are more G’s in local queue, detach from P and create new M to keep the concurrency
Return from System Call
- Go routine is placed back to run queue of original P, if P is running another G. Existing G is placed in idling P (Why this way ?) Note that a goroutine is unblocked means some data is ready for it to process, so better to have this goroutine execute immediately. This is the basic principle which Linux scheduler follows, it does a decay scheduling, where a woken up thread will be executed with higher priority.
- M is parked into idle threads
Another thing to do is if we do not know the blocking system calls, this is where a clever implementation is done in go run time
System Call Entry — Non-Blocked System Calls
- Note that P is in a system call (p.status = _Psyscall)
- Sysmon which checks to see if it needs to retake P’s from G’s if P is not come out of system call since 20us, then handoff
- P will pick another G from the run queue and executes it
System Call Return — Non-Blocked System Calls
- Runtime will try to run the G on the same P
- If G executed in the system call less than 20us, then place the G on the same P, else follow the above process
I feel the threading implementation in Go lang is a clever way to deal with to ensure the blocking in taken care of all platforms.
- Go run time can pre-create a set of M’s and keep them in the suspended state, so that thread creation is not an overhead. When needed it can resume the thread. Some of the systems provide thread suspend and resume, it can take advantage of this.
- Decay mechanism — When a goroutine is woken up, better to schedule it with a high priority, in this case, can we pushed up in the run queue of P.