Advancing Performance with Slab Allocator in Golang: Practical Implementations

Published in

Nethive Engineering

10 min readOct 23, 2023

In our previous article, we introduced the concept of the slab allocator in Go and explored its theoretical foundations. Now, in this second part of our series, we will build upon that knowledge with practical implementations. We’ll commence by covering the basics, including the installation of the required library. From there, we will gradually delve into more advanced topics, such as optimizing a common ring buffer.

If you missed the first article, you can read it here:[Laying the Groundwork].

Basic Start

First and foremost, let’s remember to install the library in our environment:

go get -u github.com/steveyen/go-slab

Now, let’s take a look at some basic examples to start using the slab:

// Create a new Arena
arena := slab.NewArena(8, 1024, 2, nil
// Allocate memory for an integer
var intBuf []byte  
intBuf = arena.Alloc(8) // 8 bytes for an int64
binary.LittleEndian.PutUint64(intBuf, 57555) // Insert value into buffer
fmt.Println("Allocated memory for int:", intBuf)  
fmt.Println("Integer stored in buffer:", binary.LittleEndian.Uint64(intBuf))
// Allocate memory for a float
var floatBuf []byte
floatBuf = arena.Alloc(8) // 8 bytes for a float64
bits := math.Float64bits(123.456)
binary.LittleEndian.PutUint64(floatBuf, bits)
fmt.Println("Allocated memory for float:", floatBuf) // Print the bytes saved in the buffer 
fmt.Println("Float stored in buffer:", math.Float64frombits(binary.LittleEndian.Uint64(floatBuf)))

“Potential pitfalls to watch out for!”

Let’s jump right into an example

// Define a Person struct with fields for name, surname, and age
type Person struct {
  Name [25]byte
  Surname [25]byte
  Age int
}
...
func main() {
  ...
  
  // Allocate memory for a PERSON variable
  personBuf := arena.Alloc(int(unsafe.Sizeof(Person{})))
  // Populate the buffer with person details
  writeStringToBuffer(personBuf[:25], "John") // Set the name
  writeStringToBuffer(personBuf[25:50], "Doe") // Set the surname 
  binary.LittleEndian.PutUint32(personBuf[50:], uint32(30)) // Set the age
  // Display the allocated memory and extracted person details
  fmt.Println("Allocated memory for person:", personBuf)
  fmt.Println("String stored in buffer:", string(personBuf[0:25]), 
              string(personBuf[25:50]), binary.LittleEndian.Uint64(personBuf[50:]))
  
  ...// Defer Memory  
}

To allocate a variable of type Person using the slab memory allocator, remember that this allocator works with byte buffers. Therefore, to allocate a Person structure, you first need to determine the number of bytes required for it. Then, use the Alloc method to obtain a byte buffer of that size. To store the Person data within this buffer, you’d typically employ a function like binary.Write.

However, this method necessitates that the Person structure fields are of fixed types (e.g., int, float64). It’s not suitable for slices or maps since they don’t have a set size. Moreover, the Person structure shouldn’t contain pointers. Pointers store memory addresses, not the data itself. If you write a pointer into the byte buffer, you’re essentially saving the memory address. Later, when reading the data from the buffer, you’ll retrieve the memory address, not the actual data.

The binary.Write method requires explicit fixed data. As such, the Person structure is unsuitable for binary.Write. One might consider using a pointer to a string within the Person struct, but there’s a catch. The slab memory allocator operates with bytes and doesn’t recognize how Go manages pointers or strings. Allocating a buffer for a structure with a pointer means you’re reserving memory for the pointer’s space, not the data it points to. So, if the structure is written into the buffer, you’re saving the pointer’s memory address, not the referenced data.

Reading the buffer later would yield the original memory address. But, since the slab memory allocator might reuse memory, the original data might no longer exist. If Go’s garbage collector moves the data the pointer references, the memory address in the buffer becomes invalid.

In essence, using pointers with the slab memory allocator is neither safe nor practical. If you need to employ strings with this allocator, you might handle them as byte arrays. For variable-sized strings, consider managing the memory allocation and deallocation yourself.

… Let’s see a solution

type Person2 struct {
  Name string
  Surname string
  Age int 
}

func main(){
  // Let's try out PERSONE2
  
  // Allocate memory for the session
  person2Buf := arena.Alloc(int(unsafe.Sizeof(Person2{}))) 
  
  // (Commented out code) Use the buffer as a Session
  //session := (*Session)(unsafe.Pointer(&sessionBuf[0]))
  
  // Initialize the session with person2 details
  (*Person2)(unsafe.Pointer(&person2Buf[0])).Name = "Matteo"
  (*Person2)(unsafe.Pointer(&person2Buf[0])).Surname = "Lando"
  (*Person2)(unsafe.Pointer(&person2Buf[0])).Age = 23
  // Display the allocated memory and extracted person2 details
  fmt.Println("Allocated memory for person2:", person2Buf)
  fmt.Println((*Person2)(unsafe.Pointer(&person2Buf[0])).Name)
  fmt.Println((*Person2)(unsafe.Pointer(&person2Buf[0])).Surname)
  fmt.Println((*Person2)(unsafe.Pointer(&person2Buf[0])).Age)
  // At the end, deallocate the memory for the session
  arena.DecRef(person2Buf) 
}

What happens when we allocate a memory space smaller than a chunk?

Why does the program still operate correctly when I set sessionBuf := arena.Alloc(60) or sessionBuf := arena.Alloc(1) ?

This happens because Go’s slab library uses a “slab allocator”, managing memory blocks of predetermined sizes. When you call arena.Alloc(60) or arena.Alloc(1), the library finds the smallest slab class that can accommodate the requested memory size.

In your case, if you set up the Arena with an initial slab size of 128 bytes (like in the example I provided), any allocation request up to 128 bytes will be met with a 128-byte memory block. So, whether you ask for 60 bytes or just 1 byte, you’ll get a 128-byte memory chunk.

However, it’s crucial to understand that even though memory is allocated in 128-byte chunks, the actual usable memory you get is exactly what you requested. In other words, if you request 1 byte, you’ll receive a 128-byte block, but you can safely use only that first byte. If you attempt to use more than 1 byte, you might overwrite data that shouldn’t be tampered with, leading to unpredictable behavior or errors in your application.

When you execute fmt.Println("Allocated memory for person:", sessionBuf), it prints the buffer content up to the length you specified during allocation.

For instance, if you allocated 60 bytes with sessionBuf := arena.Alloc(60), the sessionBuf will contain 60 bytes. Even if the Arena might have reserved a larger memory block for your request (say, 128 bytes if the slab size is 128), only the first 60 bytes are accessible to you. Thus, when you print sessionBuf, you'll see only those 60 bytes.

The “hidden” bytes you’re referring to are part of the memory block reserved by the Arena, but they aren’t accessible since you didn’t request them when calling arena.Alloc(60). Trying to access these bytes could lead to unpredictable behavior or application errors.

HOW to measure the performance of slab technology

To understand if we have correctly implemented this approach, we need to measure certain metrics. The library already includes a stats function that returns all the metrics related to the Arena. Thanks to a support function, we can divide the metrics into two main areas: Metrics related to each SlabClass and general metrics. We can identify the metrics that belong to a SlabClass because we find the prefix ‘SlabClass-000000:’, where 000000 represents the Id of the SlabClass. We easily understand that this Id is a progressive number that starts from 0.

Let’s see an example of a print:

STATS:
slabClass-000000:
  chunkSize: 256
  numChunksFree: 0
  numChunksInUse: 0
  numSlabs: 0
  numChunks: 0
slabClass-000001:
  numChunksInUse: 0
  chunkSize: 1024
  numChunks: 1
  numSlabs: 1
  numChunksFree: 1  
slabClass-000002:
  chunkSize: 512
  numChunksFree: 0
  numSlabs: 0
  numChunks: 0
  numChunksInUse: 0
slabClass-000003:
  numChunksFree: 1
  chunkSize: 128
  numChunks: 16
  numSlabs: 2
  numChunksInUse: 15
General statistics:
  totDecRefs: 2
  totDecRefZeroes: 2
  totPopFreeChunkErrs: 0
  totSetNexts: 0
  totTooBigErrs: 0
  totAddSlabErrs: 0
  totSlabClasses: 4
  totGetNexts: 0
  totMallocs: 3
  totMallocErrs: 0
  totAllocs: 17
  totAddRefs: 0
  totPushFreeChunks: 2
  totPopFreeChunks: 17

By printing these metrics, we gain insight into how different chunks are structured within our program, helping us understand their allocation better. Beyond just printing, we can also display the metrics in Grafana, offering a detailed view of what happens in the arena during our program’s execution.

Simple buffer implementation

// Create a slice to store the locations of the buffers.  
locSlice := make([]slab.Loc, 10)

// Allocate memory for each Data instance and save its location in locSlice.
for i := range locSlice {
  buf := arena.Alloc(int(unsafe.Sizeof(Data{})))
  // Convert the buffer to a Data instance and initialize its fields.
  dataBuffer := (*Data)(unsafe.Pointer(&buf[0]))
  dataBuffer.name = fmt.Sprintf("Name%d", i)
  dataBuffer.age = i
  locSlice[i] = arena.BufToLoc(buf) 
}
// Iterate over the locSlice and print out the data stored in each buffer.
for _, loc := range locSlice {
  data := (*Data)(unsafe.Pointer(&arena.LocToBuf(loc)[0]))
  fmt.Println(data.name, data.age)
}
// Deallocate each buffer using the locations saved in locSlice.  
for _, loc := range locSlice {
  buf := arena.LocToBuf(loc)
  arena.DecRef(buf)
}

NOTE:
In the slab allocator library, there isn’t a direct method to pre-allocate a specific number of chunks initially. Memory allocation happens on the first call to Alloc. However, you can craft a workaround: simply call the Alloc method for the desired number of chunks when your program starts

Ring Buffer Implementation

The code provides a thread-safe implementation of a ring buffer, optimized using the slab memory allocation. This structure allows efficient and concurrent read/write operations with fixed-size memory allocations, ensuring minimal memory wastage. The buffer supports wrapping around when it’s full, discarding the oldest data if necessary. This implementation is crafted for high-performance applications where multiple writers and readers need to concurrently access a buffer with a significant volume of information.

We tested a ring buffer implementation with a slab that automatically manages inputs and outputs as an interface type. Unfortunately, this increases heap memory allocation when we pass objects of the interface type. Although this method works, it impacts memory usage. As a result, we developed a ring buffer implementation that exclusively accepts and returns byte slices ([]byte). To pass any data type to our ring, we convert a buffer pointer to the pointer of the desired structure using ‘unsafe.Pointer’. This allows us to directly copy values from the ‘data’ structure into the ‘buf’ buffer. The buffer ‘buf’ is then passed to the ring buffer’s ‘Write’ function.

Below are the optimized write and read functions, refined for both CPU efficiency and memory utilization

func (rb *RingBuffer) Write(val []byte) {
  rb.mutex.Lock()
  if rb.full {
    rb.read = (rb.read + 1) % len(rb.data) 
  }
  
  bufferCopy := make([]byte, len(val))
  copy(bufferCopy, val)
  rb.data[rb.write] = bufferCopy
  rb.write = (rb.write + 1) % len(rb.data)
  rb.full = rb.read == rb.write
  rb.mutex.Unlock()
}

func (rb *RingBuffer) Read() ([]byte, bool) {
  rb.mutex.Lock()
  defer rb.mutex.Unlock()
  
  if !rb.full && rb.read == rb.write {
    // Buffer empty
    return nil, false 
  }
  
  val := rb.data[rb.read]
  rb.read = (rb.read + 1) % len(rb.data)
  rb.full = false
  
  return val, true
}

Example of Usage

rb := NewRingBufferSlab(10, int(unsafe.Sizeof(data)))
data := TestData{ Name: "test", ID: 1}
buf := make([]byte, unsafe.Sizeof(data))
ptrData := (*TestData)(unsafe.Pointer(&buf[0]))
*ptrData = data
rb.Write(buf)
value, ok := rb.Read()
if ok {
  val := (*TestData)(unsafe.Pointer(&value[0]))
}

Performance Test Ring Buffer

We carried out performance testing on the Slab-based ring buffer, where the ring processes byte buffers. These bytes are subsequently cast to our program’s desired data type. When compared to a traditional ring buffer, our Slab variant showed modest speed improvements in both reading and writing. Notably, our method greatly optimized heap memory allocation, minimizing allocations. Conversely, the traditional ring buffer requires a new memory allocation with every write operation.

This test was conducted by performing 1000 write and read operations on a ring buffer of size 10

Let’s see the results we obtained from running the tests:

BenchmarkRingBufferSlabWriteAndRead
Elapsed Time: 50.616μs  
Memory used: 11960 bytes
Memory allocations: 56 mallocs
Elapsed Time: 63.75μs
Memory used: 6480 bytes  
Memory allocations: 49 mallocs
Elapsed Time: 53.202μs
Memory used: 6248 bytes
Memory allocations: 47 mallocs
Elapsed Time: 48.559μs
Memory used: 6240 bytes
Memory allocations: 46 mallocs
Elapsed Time: 53.001μs
Memory used: 6240 bytes
Memory allocations: 46 mallocs
Elapsed Time: 48.664μs
Memory used: 6240 bytes
Memory allocations: 46 mallocs
BenchmarkRingBufferSlabWriteAndRead-16    1000000000        0.0000719 ns/op
PASS
BenchmarkRingBufferClassicByteWriteAndRead  
Elapsed Time: 71.216μs
Memory used: 26400 bytes   
Memory allocations: 1007 mallocs
Elapsed Time: 69.537μs
Memory used: 26392 bytes
Memory allocations: 1006 mallocs
Elapsed Time: 63.588μs 
Memory used: 26656 bytes
Memory allocations: 1010 mallocs
Elapsed Time: 51.323μs
Memory used: 26392 bytes 
Memory allocations: 1006 mallocs
Elapsed Time: 67.617μs
Memory used: 26392 bytes
Memory allocations: 1006 mallocs  
Elapsed Time: 53.821μs
Memory used: 26392 bytes
Memory allocations: 1006 mallocs
BenchmarkRingBufferClassicByteWriteAndRead-16    1000000000        0.0000719 ns/op
PASS

Use case

Now let’s examine the results obtained by replacing an old ring buffer with one that uses the slab in our software and observe its performance.

Advance Function

Now let’s look at some of the more advanced functions that the library provides us for concatenating blocks of memory. Let’s immediately see an example of how we can use the SetNext and GetNext functions to deeply understand their mechanism.

// TEST concatenation of buffers of different sizes  
arenaConcatTest := slab.NewArena(10, 1000, 2.0, nil)
// Allocation of two buffers.
buf1 := arenaConcatTest.Alloc(10) // buf dim 10. 
buf2 := arenaConcatTest.Alloc(15) // buf dim 15.
... // Insert data to the buffer
// 2. connect buf1 to buf2.
arenaConcatTest.SetNext(buf1, buf2) 
// 3. Retrieve the buffer following buf1.
nextBuf := arenaConcatTest.GetNext(buf1)
... // use nextbuf is equal to buf2
// Don't forget to free the memory when you're done.
arenaConcatTest.DecRef(buf1)
arenaConcatTest.DecRef(buf2) 
arenaConcatTest.DecRef(nextBuf) // Since GetNext increases the reference count.

Image: illustration of our memory before deallocating the chunks.

“We can consider using these functions to store very long data sequences, such as a large file. Although this memory management system is ideal for storing numerous small or medium-sized memory locations, we can break down a large file into smaller chunks. In this way, each segment can be managed separately, optimizing memory allocation and deallocation.

It’s crucial to keep track of the sequence of the chunks we store; otherwise, we could jeopardize the integrity of the entire file we are saving. The library we are using allows us to link two buffers together using the arena.SetNext(buf1, buf2) function, thus creating a chain of buffers, which don’t necessarily need to be the same size. To navigate along this chain, we can use the arena.GetNext() function.

Let’s look at a use-case example:

We have a 10-minute video.
We divide each minute into chunks, each sized to hold 1 minute of video.
During the file reading, we create these chunks and link them together.
Once the entire video is stored in memory, I can navigate between the different chunks.

With this approach, we can gain advantages in memory efficiency since if we want to modify or delete just one chunk, we can do so without having to reload the entire file. In general, this offers more flexibility in the changes we can make to the file, addressing only the relevant block. Furthermore, allocating chunks of the right size reduces memory fragmentation.

Conclusion

In concluding the article, we explored memory management in Golang, focusing on the garbage collector. While essential, it poses challenges in memory handling and fragmentation. We then introduced the Slab allocator: though it makes development more complex, it significantly improves memory allocation, surpassing the limitations of the GC. During our experiments, we noticed that developers become more aware of the variables they use, promoting a recycling approach to chunks. This awareness of variable data types is often overlooked due to the simplified syntax offered by Golang.

Useful Links

GitHub Repository

Link to the GitHub repository: repo