Exploring the Inner Workings of Garbage Collection in Golang : Tricolor Mark and Sweep

Sourav Choudhary
7 min readSep 29, 2023

--

Connect me on LinkedIn 🤝 to craft scalable systems

Part 2 : Describe the tricolor mark and sweep algorithm used by Go’s garbage collector, breaking down each phase and its significance.

Tricolor Mark and Sweep Algorithm

Go’s garbage collector (GC) uses the Tricolor Mark and Sweep algorithm, which is designed for concurrent and efficient garbage collection. This algorithm categorizes objects into three colors: white, gray, and black, to determine their reachability and eligibility for collection.

1. White Objects:

-Definition: Objects that are initially unmarked are considered white.

- Characteristics: White objects are potential candidates for collection since they are not currently reachable from the root set (e.g., global variables and goroutine stacks).

2. Gray Objects:

- Definition: Objects that are marked for collection but have not had their references explored are classified as gray.

- Characteristics: Gray objects are in the process of being examined to identify their references to other objects. They are added to a work queue for further processing.

3. Black Objects:

- Definition: Objects that are marked for collection and have had their references explored are considered black.

  • Characteristics: Black objects are confirmed as reachable and are safe from collection in the current cycle.

The Tricolor Mark and Sweep Cycle:

1. Initialization: The cycle begins with all objects initially marked as white.

2. Initial Marking (Stop-the-World): During this short pause, the GC marks objects directly reachable from the root set (global variables and goroutine stacks) as gray. This step ensures a consistent starting point for the collection.

3. Concurrent Marking: After the initial marking phase, the GC enters the concurrent marking phase. It explores references from gray objects to identify and mark other objects as gray. This phase is concurrent with the application’s execution, meaning that the program continues running while the GC identifies and marks reachable objects.

4. Termination of Concurrent Marking: The concurrent marking phase continues until there are no more gray objects to process. All reachable objects have been marked as black or gray, and the marking phase concludes. This phase involves no stop-the-world pauses, allowing the program to continue running without significant interruption.

5. Sweeping: After marking, the GC proceeds to the sweeping phase. During this phase, it sweeps through the heap, deallocating memory associated with white objects (unreachable objects). The memory of these objects is returned to the heap for reuse. Sweeping also happens concurrently with the application’s execution.

6. Reclamation: In the reclamation phase, the GC can return the memory that has been reclaimed during sweeping to the operating system, reducing the overall memory footprint of the Go program.

Concurrent and Efficient:

One of the key advantages of the Tricolor Mark and Sweep algorithm in Go is its concurrency. Most of the garbage collection work is performed concurrently with the application, minimizing the impact on program execution and providing a smooth user experience.

This algorithm’s efficiency lies in its ability to quickly identify and mark reachable objects without the need for excessive stop-the-world pauses, making it a crucial component of Go’s memory management system.

Implementation in Golang

package main

import (
"fmt"
"runtime"
"sync"
"time"
)

// Object represents an object that needs to be garbage collected.
type Object struct {
mu sync.Mutex // Mutex for synchronization
Marked int // Marked indicates the color of the object.
Next *Object // Next represents a reference to another object.
}

// Constants for colors used in tricolor marking.
const (
white = iota // White objects are unmarked and unreachable.
gray
black // Black objects are marked and reachable.
)

// Global variables for managing the heap and objects.
var (
heap []*Object // Simulated heap.
rootSet []*Object // Root set of objects (global variables).
mu sync.Mutex // Mutex to protect global data structures.
)

func main() {
// Initialize the heap and objects.
initHeap()

// Construct an object graph.
obj1 := &Object{}
obj2 := &Object{}
obj3 := &Object{}

obj1.Next = obj2
obj2.Next = obj3
rootSet = []*Object{obj1} // Set the root objects.

// Simulate the tricolor mark-and-sweep garbage collection cycle.
for i := 0; i < 3; i++ {
gcCycle()
fmt.Printf("After GC cycle %d:\n", i+1)
printObjectStatus()
time.Sleep(1 * time.Second)
}
}

// initHeap initializes the heap with objects.
func initHeap() {
// Initialize the heap with objects.
heap = make([]*Object, 100)

// Populate the heap with objects.
for i := 0; i < 100; i++ {
heap[i] = &Object{}
}
}

// gcCycle simulates a complete tricolor mark-and-sweep garbage collection cycle.
func gcCycle() {
// Step 1: Initialization - Reset the color of all objects to white.
resetColors()

// Step 2: Initial Marking - Mark objects directly reachable from the root set as gray.
initialMark()

// Step 3: Concurrent Marking - Mark other objects as gray concurrently.
concurrentMark()

// Step 4: Termination of Concurrent Marking - No additional work needed.
// In our simplified example, the termination of concurrent marking is implicitly
// handled by the fact that goroutines for marking have finished their work
// when they mark all reachable objects as black.

// Step 5: Sweeping - Deallocate memory for unreachable (white) objects.
sweep()

// Step 6: Reclamation - No additional work needed.
// In our simplified example, we don't explicitly implement a reclamation phase
// because it's typically handled by the Go runtime and not something that
// application-level code would manage directly. The Go runtime is responsible
// for efficiently managing memory and returning it to the operating system
// when appropriate.
}

// resetColors resets the color of all objects to white.
func resetColors() {
mu.Lock()
defer mu.Unlock()

for i := range heap {
heap[i].Marked = white
}
}

// initialMark marks objects directly reachable from the root set as gray.
func initialMark() {
mu.Lock()
defer mu.Unlock()

for _, obj := range rootSet {
obj.Marked = gray
}
}

// concurrentMark marks objects as gray concurrently.
func concurrentMark() {
var wg sync.WaitGroup

mu.Lock()
for _, obj := range rootSet {
if obj.Marked == gray {
// Start a goroutine to mark objects from the gray root set.
wg.Add(1)
go func(o *Object) {
defer wg.Done()
mark(o)
}(obj)
}
}
mu.Unlock()

// Wait for all goroutines to finish.
wg.Wait()
}

// mark recursively marks reachable objects using depth-first search.
func mark(obj *Object) {
if obj == nil || obj.Marked == black {
return
}

// Mark the current object as reachable.
obj.Marked = gray

// Recursively mark its references.
mark(obj.Next)

// After marking references, mark the object as black.
obj.Marked = black
}

// sweep deallocates memory for unreachable (white) objects.
func sweep() {
mu.Lock()
defer mu.Unlock()

for i := range heap {
obj := heap[i]
if obj != nil && obj.Marked == white {
// Free the unmarked (white) object.
heap[i] = nil
}
}
}

// printObjectStatus prints the status of each object (black, gray, or white).
func printObjectStatus() {
mu.Lock()
defer mu.Unlock()

for i, obj := range heap {
if obj == nil {
fmt.Printf("Object %d: Freed\n", i)
} else if obj.Marked == black {
fmt.Printf("Object %d: Marked (Black)\n", i)
} else if obj.Marked == gray {
fmt.Printf("Object %d: Marked (Gray)\n", i)
} else {
fmt.Printf("Object %d: Unmarked (White)\n", i)
}
}
}

How root objects are decided ?
In Go’s garbage collector, root objects are objects that are directly reachable from outside the garbage collection process. Identifying root objects is essential because they serve as the starting point for the garbage collection process. Root objects typically include:

1. Global Variables: Global variables in your Go program can reference objects. These global variables are considered root objects because they can be directly accessed by the garbage collector.

2. Goroutine Stacks: Active goroutine stacks can also contain references to objects. Since goroutines are concurrently executing, their stacks are considered root sets. The garbage collector scans goroutine stacks to identify additional root objects.

To identify root objects, We don’t need to do anything special in our Go code. The Go runtime and garbage collector handle this automatically. However, if we want to analyze or understand which objects are root objects, we can use tools like Go’s built-in pprof package for memory profiling.

Here’s how you can use pprof to identify root objects:

1. Import the `net/http/pprof` package:

import _ "net/http/pprof"

2. Add an HTTP server to your program to expose profiling endpoints:

go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()

This starts an HTTP server on port 6060.

3. Start your program and access the profiling endpoint in a web browser or use a tool like `go tool pprof` from the command line:

go tool pprof http://localhost:6060/debug/pprof/heap

This will provide a heap profile that includes information about root objects and their references.

4. Analyze the heap profile to understand which objects are root objects and what references they hold. This can help you identify potential memory leaks or objects that are being held longer than expected.

If you read uptill now, then I hope you liked this article and if you like this article then please Clap, as it motivates me to help the community.

Please comment if you found any discrepancy in this article or if you have any question related to this article.

Thank You for your time.

Connect me on LinkedIn 🤝

Part 1 : https://medium.com/@souravchoudhary0306/under-the-hood-exploring-the-inner-workings-of-garbage-collection-in-golang-95b50bc8ce1d

--

--