Efficient Memory Management with Slab Allocator in Golang: Laying the Groundwork

Published in

Nethive Engineering

8 min readOct 23, 2023

Introduction

In this article, we’ll delve into the potential of the Slab allocator concept to enhance the performance of our Golang programs. Before diving into its implementation and details, it’s crucial to understand how Golang manages memory by default through its garbage collector, weighing its pros and cons.

Garbage Collector

Go automatically manages the allocation of variables and includes the garbage collector that automatically recycles memory as needed weaknesses. Identify parts of memory that are no longer needed.

In Go, memory is allocated more efficiently based on the lexical scope in which it’s created: A variable is associated with the block where it is declared. Typically, variables are placed on the memory stack. However, when the compiler can’t determine the exact lifespan of a variable, it’s placed in the heap, where memory is dynamically allocated. The primary role of the garbage collector is to identify and reclaim memory specifically from these dynamic allocations.

The principal objectives of the collector are to:

Automatically free up memory from unreachable objects, preventing memory leaks.
Minimize memory management errors, such as use-after-free and dangling pointers.
Operate concurrently with the program, reducing pause times.
Abstract away the complexities of manual memory management for developers.
Ensure efficient memory management with minimal overhead.

Resources Used:

The GC has to balance between using the computer’s processor (CPU) and its storage (memory). The memory used includes the currently active memory, any new memory added before the marking step, and space for some extra details (metadata). The CPU use has two parts: one fixed part for running the GC and another part that changes based on how much memory is active.

Parameters:

To balance the trade-off between the two main variables, you can adjust the cyclic frequency of the GC and set a threshold for when the GC should start working. The key parameter is GOGC, which determines the target heap size after each GC cycle. We aim to finish a cycle before the total heap size exceeds this target. The target size is defined as:

Target heap memory = Live heap + (Live heap + GC roots) × GOGC/100

The key point to remember is that by doubling GOGC , you will double the heap memory overhead and roughly halve the CPU cost of the GC, and vice versa. For a more detailed graphical representation of the process, we recommend visiting the official page that dynamically illustrates how changing the parameter affects heap memory usage.

Ideally, it’s possible to turn off the GC by setting:

GOGC=off

However, this can lead to a linear or even exponential increase in heap memory usage if variables are not properly managed within the program. In some programs, especially more complex ones requiring high performance, the CPU usage for the GC might become a bottleneck. On the other hand, we understand that simply turning off the GC is not a viable solution.

Obviously, in real-world scenarios, the live heap memory can fluctuate during the program’s execution, and these fluctuations can result in a higher allocation rate leading to more frequent GC cycles. Relying solely on this parameter has its limitations due to available memory constraints. Issues arise when we encounter spikes in the live heap memory. That’s why, in version 1.19, Go introduced support for setting a runtime memory limit. This limit can be configured using the GOMEMLIMIT environment variable.

Using a memory limit can pose challenges when the live heap becomes too large and approaches the limit closely. This can lead to an increase in GC cycles, resulting in high CPU usage and a potential program stall. To address this, a “soft limit” was introduced, allowing the memory to exceed the set limit slightly to reduce the number of cycles.

Of course, the use of these parameters greatly depends on the environment in which the program runs. The ideal settings vary based on our needs, and only through testing can we practically observe their behavior.

Slab allocation

The library we will examine in the next section exploits the mechanism of the slab allocator, a memory management system that aims to allocate objects efficiently.

Kernels require a memory allocator under the page level. While pages serve as the standard unit for memory operations, there are kernel data structures smaller than a page. The core idea behind the “slab allocator” is to ensure a page holds only data structures of the same kind, thus eliminating external fragmentation. Each object type can have its dedicated allocator, termed “cache”. These caches manage “slabs”, pages filled with objects of a consistent type. The elegance of slab allocation lies in its adaptability to the needs of different data types.

In this system, the cache represents a pre-allocated memory space dedicated to a specific data type, while the slab is a contiguous portion of memory of fixed size, within which resides a predetermined number of objects of a given type. This method of memory management keeps track of all memory blocks.

The slab allocator aims to reduce object allocation and destruction operations by the kernel. If such operations occur too frequently, they can adversely affect CPU performance. The key feature of this mechanism is to release objects after use but keep them cached without immediately destroying them. This allows the allocation to be ready to be reused by an object of the same type.

The memory allocator handles both requests to allocate objects and requests to free memory. During the object allocation phase, it checks whether free memory blocks already exist; if such blocks are available, they are occupied, otherwise, a new allocation is created. By tracking free blocks and adopting a method of recycling blocks, the slab allocator greatly reduces memory fragmentation.

Thus, we have realized that the slab allocator proves particularly useful in cases where there is a need to efficiently manage small or medium-sized objects that are created and destroyed very frequently.

Here is an image representing how the slab allocator works:

Graphical representation of Slab Allocation Structure: Link of the image

SLAB library

Let’s dive into the details of the implementation found in the library https://github.com/couchbase/go-slab. At its core, this library boasts a well-structured data model. The top-level structure is the ‘Arena’, while at its foundation, we have the ‘chunks’.

From a bottom-up perspective:

Struct Chunk: This is a memory block within a slab. Apart from the data itself, it stores reference counts and size information. It’s essential to note that chunks have a predefined size. Later on, we’ll delve into how these chunks are specifically sized.
Struct Loc: This struct is designed to represent a chunk of memory used in the arena. This structure serves as a pointer to the chunks; in fact, it cannot be accessed directly using traditional pointers.
mageStruct Slab: A memory block that houses multiple chunks. Each slab has a unique ID and a fixed size. However, the number of chunks it contains depends on the individual size of those chunks.
Struct SlabClass: As the name suggests, it denotes a class of slabs that share the same chunk size.

General Functioning:

The library operates using a slab class approach. When a user requests memory allocation, the library searches the arena for a slab class that can meet the request. If none is found, a new class is created with the chunk size being equal to or larger than the requested size. Furthermore, the requested allocation size must be smaller than the slab size, set when the arena is established.

Main Functions:

Struct Arena: The main structure overseeing the collection of slab classes and memory in general. Moreover, it’s responsible for tracking various statistics, including total allocations, deallocations, errors, and more.
NewArena: Initializes a new arena.
startChunk: Initial size of the chunk.
slabSize: Total size of the slab.
growth factor: Growth rate for chunk sizes when adding to a new slab class.

Note:
Chunks, Slabs, and SlabClasses: These are associated with a specific Arena for the entirety of the program’s execution. Therefore, if you, for instance, execute the method arena.AddRef(buffer) on a buffer that doesn’t belong to that arena, and you will receive the runtime error message: “buf not from this arena.”
Experimental Library Release: A new experimental library has been recently launched. You can find more about it in this article: The Golang Arena Is Here. This library specifically deals with memory management efficiently, reducing resources used by the garbage collector. While we won’t delve into this topic in our discussion, I still recommend checking out the linked article as it’s very well-written
It’s important to note that adopting the Slab allocator can introduce certain challenges. Primarily, it increases the complexity of our program. Although Golang was designed to simplify programming, thanks in part to the support of the Garbage Collector (GC) for automatic memory management, the use of this mechanism might not be recommended for many projects where performance isn’t a primary concern. Additionally, the use of the Slab allocator emphasizes the need for the programmer to be more aware of variables, as they are explicitly allocated and deallocated from memory

Tools for tests

Before proceeding with the implementation of the slab, it’s essential to identify a method to evaluate the performance of our software, specifically monitoring memory and CPU usage. While it’s possible to print the collected values on the screen from various code segments, for a clearer and more effective representation, one might consider using metrics through Prometheus and then graphically displaying them on Grafana. This combination provides a detailed analysis and an immediate understanding of the gathered data.

Let’s start by setting up pprof in our program. With just one line, we can launch a new goroutine that runs a function to initiate an HTTP server on “127.0.0.1:6060”. This server listens for HTTP requests and, thanks to the prior import of the pprof package, provides profiling endpoints. Should any problems arise while starting or running the server, they will be recorded using log.Println.

import (
  _ "net/http/pprof"
)
go func() {
  log.Println(http.ListenAndServe("127.0.0.1:6060", nil)) 
}()

From the terminal, using this program, we can fetch profiling data from a running Go application, specifically requesting a 30-second CPU profile. With the top command, we see a list of the top 10 functions that consume the most CPU time.

go tool pprof http://localhost:6065/debug/pprof/profile
(pprof) top
… list of the functions that consume the most CPU time …

For more information, I recommend visiting this article(Profiling Go programs with pprof)

Let’s explore some primary functions that we might encounter when using pprof:

runtime.gcBgMarkWorker: This is the entry point for mark worker goroutines. It reveals the time spent marking and scanning, which depends on GC frequency and the object graph's complexity.
runtime.mallocgc: This is where heap memory is allocated. If over 15% of the time is spent here, it suggests high memory allocation.
runtime.gcAssistAlloc: This function shows when an application assists the GC in marking and scanning. If more than 5% of the time is consumed here, it means the application is allocating memory faster than the GC can manage.

Next step

After this theoretical introduction, in the next article, we will delve into the practical implementation of the theoretical concepts discussed so far. We will start with simple examples and gradually move towards creating an efficient data structure, examining how the slab allocator works and its benefits.