Parallelism and Concurrency in Go: How It Works in Real Computing Systems. Part 2.

12 min readOct 3, 2023

In the first part, we explored parallelism and multitasking at the physical and microarchitecture levels of abstraction.

In this part, we will delve into how multitasking is structured at higher levels:

Operating System Level
Programming Language Level
Application Software Level

Operating System Level

The operating system (OS) serves as the bridge between hardware and the user. Each operating system has its own way of implementing multitasking. In this discussion, I’ll focus on multitasking in Unix-based OSes.

One of the tasks of the operating system is to implement parallelism and multitasking, where multiple programs are executed on one or several processors.

All images generated by Image Creator from Microsoft Bing

Each program execution corresponds to an OS process. A process is an instance of a running program that has its own memory space, including a stack for local variables, a heap for dynamically allocated memory, and other segments for code and data.

Another concept in Unix multitasking is threads. A thread represents parallel tasks within a single process, which share common memory. Each process has at least one thread, the main thread.

Command ps -eLf shows all processes and threads with full-format listing. PID — process ID, LWP — thread ID, NLWP — number of threads per process, CMD —the name of the thread.

Processes and threads are managed by the OS’s task scheduler. In Unix, the scheduler works on the principle of preemptive multitasking. In preemptive multitasking, task switching occurs through the scheduler’s intervention, without direct involvement of the tasks themselves. Preemptive multitasking allows defining time intervals during which a task will execute, prioritizing tasks, and evenly distributing processor time among active tasks.

There is also cooperative multitasking. In cooperative multitasking, task switching occurs only when the currently running task explicitly yields control to another task. In this model, tasks “cooperate” to allow other tasks to execute. While this model is simple to implement, it has a drawback: one “selfish” task can block or slow down the entire system if it doesn’t release the processor.

We can not interact with threads and processes in the OS from our Go program, because Go has a special scheduler built into the runtime that interacts with threads and OS processes on behalf of the user.

However, there are some features that allow limited interaction with threads directly at the OS level.

For instance, the runtime.LockOSThread and runtime.UnlockOSThread functions from the runtime package let you bind the calling goroutine to its current OS thread. The calling goroutine will always run in this thread. No other goroutine will run in it until the calling goroutine calls UnlockOSThread the same number of times as LockOSThread. This behavior is useful when dealing with third-party libraries that rely on thread-local storage mechanisms and require a single-threaded execution environment.

Usage example:

package main

import (
 "fmt"
 "runtime"
 "sync"
)

func worker(id int, wg *sync.WaitGroup) {
 // Lock the current goroutine to an OS thread.
 runtime.LockOSThread()
 defer runtime.UnlockOSThread()

 defer wg.Done()

 for i := 0; i < 5; i++ {
  fmt.Printf("Worker %d: Iteration %d\n", id, i)
 }
}

func main() {
 // Create a wait group to synchronize goroutines.
 var wg sync.WaitGroup

 // Create and start three worker goroutines.
 for i := 0; i < 3; i++ {
  wg.Add(1)
  go worker(i, &wg)
 }

 // Wait for the worker goroutines to finish.
 wg.Wait()
}

Programming Language Level

At this level, programmers work within the environment of high-level programming languages, in our case, Go Runtime.

Go Runtime implements a special goroutine scheduler that acts as an intermediary between OS processes and threads and user code. It provides developers with an entity for creating multithreaded parallel code — goroutines.

Compared to threads in operating systems, goroutines are lightweight. The size of a goroutine during initialization is only 2KB, whereas in Linux, for example, each thread is allocated 2MB by default (and even more in some distributions), and in Windows it’s 1MB. Moreover, the size of goroutines can grow during the execution of user code.

Due to their lightweight nature, goroutines can be created in much larger quantities. The number of goroutines depends only on the available system resources where the code is run, while the number of threads in the OS is limited by a specific setting. Additionally, switching between goroutines in Go is a relatively inexpensive operation, which increases performance and speed during context switching.

Let’s delve into how the Go scheduler works.

As we mentioned, the Go scheduler acts as an intermediary between OS threads and processes and the goroutines that developers work with.

The key entities the Go scheduler operates with are:

M (machine) — an OS thread. The OS manages this thread.
G (goroutine) — a goroutine. Each goroutine has its stack, command pointer, and other crucial information for scheduling work.
P (process) — a resource required to execute Go code. The number of P equals GOMAXPROCS (the number of processors used by the code).

Each M thread executing code is associated with a P process. At any given time, only one M thread can execute on a P process. The scheduler can create an M thread if needed. When an M thread is idle or waiting, it doesn’t require a dedicated P process.

In each cycle, the Go scheduler looks for a goroutine that’s ready to run. The Go scheduler implements the work-stealing mechanism, allowing a P process to steal a goroutine from another P process’s queue and take it into its own empty local queue.

Additionally, the Go scheduler implements spinning threads: an M thread associated with a P process cyclically searches for a new goroutine G to execute. An M thread without an assigned P process waits for other P processes to become available.

Starting from Go 1.14, the Go scheduler is preemptive. Before that version, it was cooperative. You can read more about the implementation of the goroutine scheduler’s work here and here, as well as about the implementation of non-cooperative scheduling in the discussion.

There’s no need or possibility to manage goroutines directly — that’s the job of the goroutine scheduler. However, there’s one tool that allows a goroutine to yield control back to the goroutine scheduler — the runtime.Gosched() function. Calling runtime.Gosched() informs the scheduler that the current goroutine is ready to give up some processor time to another goroutine. When runtime.Gosched() is called, the scheduler switches to selecting the next goroutine to execute. This may lead to another goroutine running instead of the current one, if there are tasks available for execution. The runtime.Gosched() function helps achieve a fairer distribution of time between goroutines and prevents one goroutine from blocking others. However, in most cases, you won’t need to explicitly call runtime.Gosched() because the Go scheduler takes care of distributing time between goroutines automatically. The built-in scheduler mechanisms efficiently handle the execution switching between goroutines.

You can see how runtime.Gosched() works in this example:

package main
import (
  "fmt"
  "os"
  "runtime"
  "runtime/trace"
  "sync"
)
func main() {
  runtime.GOMAXPROCS(1)
  f, _ := os.Create("trace.out")
  trace.Start(f)
  defer trace.Stop()
  var wg sync.WaitGroup
  wg.Add(2)
  go funcA(&wg)
  go funcB(&wg)
  wg.Wait()
}
func funcA(wg *sync.WaitGroup) {
  defer wg.Done()
  for i := 1; i <= 5; i++ {
     fmt.Printf("A%d", i)
     runtime.Gosched()
  }
}
func funcB(wg *sync.WaitGroup) {
  defer wg.Done()
  for i := 1; i <= 5; i++ {
     fmt.Printf("B%d", i)
     runtime.Gosched()
  }
}

If we run this code, we will observe in the tracing that the goroutines funcA and funcB alternately yield control to the scheduler, which then invokes other goroutines (the brown and violet areas in Proc 0).

If you run the same code without invoking runtime.Gosched(), you will see that the functions funcA and funcB execute without yielding (the blue and pink areas in Proc 0).

Application Software

At this level, there is code that programmers write in their chosen language, in our case, Go. A Go developer has several tools at their disposal to manage concurrency, synchronize code, and exchange information between goroutines. These tools can be loosely divided into two functional groups:

Tools for synchronization and code execution waiting
Tools for safe data writes

Tools for synchronization and code execution waiting

Wait group

When launching goroutines, we need to wait for their completion; otherwise, the program may finish execution before we obtain results from all goroutines. To synchronize the completion of goroutines, we use sync.WaitGroup, which allows waiting for the completion of a specified number of goroutines.

Example of using WaitGroup:

package main
import (
  "fmt"
  "sync"
  "time"
)
func worker(id int, wg *sync.WaitGroup) {
  // Decrease the counter when the goroutine is finished
  defer wg.Done()
  // Simulate some work.
  fmt.Printf("Worker %d started\n", id)
  time.Sleep(time.Second)
  fmt.Printf("Worker %d completed\n", id)
}
func main() {
  const numWorkers = 3
  // Create an instance of WaitGroup
  var wg sync.WaitGroup
  fmt.Println("Main started")
  // Increment the counter for the goroutines.
  for i := 1; i <= numWorkers; i++ {
     wg.Add(1)
     go worker(i, &wg)
  }
  // Block execution until all goroutines finish.
  wg.Wait()
  fmt.Println("Main completed")
}

Here, we are waiting for the completion of 3 goroutines in which the worker function is called. Until the goroutine counter is not equal to 0, the sync.Wait() function will prevent the program from terminating.

2. Context

If the invoked goroutines are interconnected, you can synchronize their execution using a context. A context allows canceling the execution of other goroutines or setting a timeout for their completion. For example, you are performing calculations using data from external servers. If the calculation takes longer than the specified timeout, you can terminate the execution of all goroutines associated with this calculation.

3. Select

Statement The select statement allows you to execute code based on the results of several input/output operations. It also allows blocking the execution of code until a specific event occurs.

Example of using context and select with the for-select (for more details check this article):

package main
import (
  "context"
  "fmt"
  "math/rand"
  "os"
  "os/signal"
  "syscall"
  "time"
)
// someTask function that we call periodically.
func someTask() {
  fmt.Println(rand.Int() * rand.Int())
}
// PeriodicTask runs someTask every 1 second.
// If canceled goroutine should be stopped.
func PeriodicTask(ctx context.Context) {
  // Create a new ticker with a period of 1 second.
  ticker := time.NewTicker(time.Second)
  for {
    select {
      case <-ticker.C:
        someTask()
      case <-ctx.Done():
        fmt.Println("stopping PeriodicTask")
        ticker.Stop()
        return
      }
   }
}
func main() {
  ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
  defer cancel()
  go PeriodicTask(ctx)
  // Create a channel to receive signals from the operating system.
  sigCh := make(chan os.Signal, 1)
  signal.Notify(sigCh, syscall.SIGTERM)
  // The code blocks until a signal is received (e.g. Ctrl+C).
  <-sigCh
}

4. Errgroup

The errgroup library allows you to terminate the execution of other goroutines if at least one goroutine in the group completes with an error. For example, if you’re performing a calculation using data from remote web servers, and one of the servers returns an error instead of data, you can stop making requests to other servers and terminate the calculation.

Here’s an example of how to use errgroup (detailed description of the pattern in this article):

package mainimport (
  "errors"
  "fmt"
  "golang.org/x/sync/errgroup"
)
// errFailure some custom error.
var errFailure = errors.New("some error")
func main() {
  // Create errgroup.
  group := errgroup.Group{}
  // Run first task.
  group.Go(func() error {
    time.Sleep(5 * time.Second)
    fmt.Println("doing some work 1")
    return nil
   })
  // Run second task.
  group.Go(func() error {
  fmt.Println("doing some work 2")
  return nil
  })
  // Run third task.
  group.Go(func() error {
  fmt.Println("doing some work 3")
  return errFailure
  })
  // Wait for all goroutines to complete.
  if err := group.Wait(); err != nil {
    fmt.Printf("errgroup tasks ended up with an error: %v\n", err)
  } else {
    fmt.Println("all works done successfully")
  }
}

Tools for Safe Data Writing

If you need concurrent access to the same variable from different goroutines, you must ensure safe access to it to avoid data race conditions. A data race is a situation in multi-threaded programming where several goroutines simultaneously access the same shared memory for reading and writing data, and at least one of these accesses is a write operation. This can lead to errors and incorrect data.

In the Go language, there are special tools that allow you to avoid this state.

Mutex

Mutexes are a synchronization mechanism used to provide exclusive access to shared data from multiple goroutines. In Go, mutexes are represented by the sync.Mutex structure (there is also sync.RWMutex). When a goroutine wants to access shared data, it calls the Lock() method on the mutex. If the mutex is free (not locked by another goroutine), it becomes locked, and access is granted to the current goroutine. If another goroutine has already locked the mutex, the next goroutine calling Lock() will be blocked and wait until the mutex is unlocked. After usage, the goroutine calls the Unlock() method, unlocking the mutex. After this, other goroutines can access the shared data.

Here’s an example of using mutex:

package main
import (
  "fmt"
  "sync"
)
func main() {
  myMap := map[int]int{} // Create a map to store integer values.
  mu := sync.RWMutex{}   // Create a read-write mutex for synchronization.
  wg := sync.WaitGroup{} // Create a wait group to wait for all goroutines to finish.
  for i := 0; i < 5; i++ {
    wg.Add(1) // Increment the wait group counter for each goroutine.
    go func(i int) {
      defer wg.Done() // Notify the wait group when the goroutine finishes.
      mu.Lock()   // Lock the mutex for exclusive access during read and write operations.
      myMap[i] = i // Perform a write operation on the map.
      mu.Unlock() // Unlock the mutex after the write operation is complete.
    }(i)
  }
  wg.Wait() // Wait for all goroutines to finish before proceeding.
  fmt.Println(myMap) // Print the contents of the map.
}

2. Atomic Operations

If there is a need to restrict write access to primitive data types, such as integer types, pointers, and boolean values, you can use the equivalent of mutexes, which is the sync/atomic package. Compared to mutexes, the sync/atomic package in Go provides a faster way to restrict writing to variables because each function call from the sync/atomic package is compiled into a special set of machine instructions that are essentially executed at the processor level. Here is an example of using sync/atomic:

package main
import (
  "fmt"
  "sync"
  "sync/atomic"
)
func main() {
  var myCounter atomic.Int64
  wg := sync.WaitGroup{}
 
  for i := 0; i < 5; i++ {
     wg.Add(1)
     go func(i int) {
        defer wg.Done()
        myCounter.Add(1) // Each goroutine increments the myCounter by 1.
     }(i)
  }
  wg.Wait()
  fmt.Println(myCounter.Load()) // Display the final value of the counter.
}

4. Channels

Channels synchronize the execution of goroutines and safely pass data between them. Compared to mutexes, channels may be less performant but are more secure. Channels are inherently thread-safe and single-threaded. This means they guarantee correct behavior when working with multiple goroutines but allow access to only one goroutine at a given moment.

Here’s an example of code where one goroutine blocks execution until it receives data from another goroutine (using a non-buffered channel):

package main
import (
    "fmt"
    "sync"
)
func main() {
    // Create a channel for synchronization between goroutines.
    ch := make(chan int)
    // Create a WaitGroup to wait for the goroutines to finish.
    var wg sync.WaitGroup
    // Goroutine 1.
    wg.Add(1)
    go func() {
        defer wg.Done()
        // Perform some work in the first goroutine.
        fmt.Println("Goroutine 1.")
        time.Sleep(time.Second)
        ch <- 42
    }()
    // Goroutine 2.
    wg.Add(1)
    go func() {
        defer wg.Done()
        // Perform some work in the second goroutine.
        fmt.Println("Goroutine 2.")
        result := <-ch
        fmt.Println("Result from Goroutine 1:", result)
    }()
    // Wait for both goroutines to finish.
    wg.Wait()
    // Close the channel.
    close(ch)
}

In this example, the second goroutine does not complete its execution until the first goroutine sends a value to the channel ch.

The Go programming language provides a wide range of tools for creating parallel and multitasking code. By combining these tools, you can build complex multithreaded programs. Understanding how concurrency works at all levels of your computing device will help you create safer and more efficient code.

Thank you for reading the article!

If you want to support me or request specific articles — you can buy me a coffee ;)