Go Memory Arenas

Endre Simo
8 min readApr 24, 2023

--

In this article we’ll test out the experimental memory management system introduced in Go 1.20 release, called arena. Arena is an efficient, new type of memory management system, introduced in the idea to circumvent the garbage collection overhead to create more performant Go applications. In this article we will discuss the following topics:

  • What are the Go arenas?
  • When and why memory arenas are beneficial for your application?
  • How to use the memory arenas?
  • Testing arenas performance
  • Conclusions

What are the Go arenas?

Go is garbage-collected programming language, which means the memory allocation and de-allocation are managed by the Go runtime. This eliminates the need of manual memory management, but it comes with a cost: the runtime must keep track of all the allocated objects which are freed up once they are going out of scope. This is done by periodically invoking the runtime scheduler and marking the unused objects to be garbage collected.

Unfortunately this comes with higher CPU usage and latency, because the runtime has to suspend the program execution (in Go terminology this is called Stop The World) resulting in spending more time doing garbage collection. This process is happening in two phase: mark and sweep. The mark phase is when the objects running out of scope are marked to be garbage collected and the sweep phase is when the memory addresses occupied by these objects are effectively cleaned up. All of these operations leading to a significant performance overhead.

When and why memory arenas are beneficial for your application?

You can benefit from memory arenas when the allocations are happening more frequently, and the size of the allocated data is typically smaller. One type of use case would be a web server which has to serve million requests continuously and supposedly needs to store small chunks of data in a high frequency rate. By using the standard memory allocation, this would lead to an increased GC overhead, since the memory addresses occupied by these objects should be instantly picked up by the Go runtime for deallocation.

Without / With Memory Arenas

We can prove this assumption down below, when we’ll analyze some of the benchmarks we have created. The cool thing about the memory arenas is that they allocates continuous memory regions, and the deallocation is happening at once without the GC overhead.

How to use the memory arenas?

As I have mentioned before, the memory arenas are available only as experimental features, so in order to use them you have to run your application with the GOEXPERIMENT=arenas environment variable. This means that there are no guarantees from the Go dev team that the API won’t change over time or whether it will continue to exists in any future release. For these considerations is not yet recommended to use it in production.

This is how you can run your program with memory arenas enabled.

$ GOEXPERIMENT=arenas go run main.go

Now let’s take a simple example that uses the arenas. Below we are defining some helper functions, which actually are wrappers around the existing arena methods.

// New allocates a new memory arena.
func New() *arena.Arena {
return arena.NewArena()
}

// NewAlloc creates a new T value in the allocated memory arena.
func NewAlloc[T any](a *arena.Arena) *T {
return arena.New[T](a)
}

// Free frees the memory arena without the garbage collection overhead.
func Free(a *arena.Arena) {
if a == nil {
return
}
(*arena.Arena)(a).Free()
}

// MakeSlice creates a new slice and puts it into the arena.
func MakeSlice[T any](a *arena.Arena, l, c int) []T {
if a == nil {
return make([]T, l, c)
}
return arena.MakeSlice[T](a, l, c)
}

The New function initializes a new memory arena, where the NewAlloc allocates a new object inside the above initialized memory arena. Now let’s populate it with some values. One interesting observation is that arena New returns a pointer to a generic type, which means that it’s initialized with the type’s default value, so in order to assign some real values you have to dereference it.

mem := arena.New()
defer arena.Free(mem)

val := arena.NewAlloc[int](mem)
fmt.Println(*val) // print 0
*val = 10
fmt.Println(*val) // print 10

Let’s see another example, but this time using arena.MakeSlice function.

type Struct[T any] struct {
len int
data []T
}

func allocate() {
size := 1000
mem := arena.New()
defer arena.Free(mem)

obj := arena.NewAlloc[Struct[T]](mem)
obj.len = size
obj.data = arena.MakeSlice[T](mem, 0, size)

for i := 0; i < size; i++ {
obj.data = arena.Append(mem, obj.data, (any(i)).(T))
}
}

In this example we are allocating a struct in a new memory arena and we are populating the slice by using the Append function. You might notice that we call the arena.Free function in a defer statement. This frees the arena (and all objects allocated from the arena) so that memory backing the arena can be reused fairly quickly without garbage collection overhead.

In case the data inserted into arena is used after calling arena.Free will panic if the address sanitizer option (asan) is enabled.

type T struct {
val int
}

func memAllocArenaPanic() *T {
mem := arena.New()

obj := arena.NewAlloc[T](mem)
arena.Free(mem)
// Accessing a variable after the allocated memory has been released will panic.
obj.val = 1

return obj
}
$ GOEXPERIMENT=arenas go run -asan main.go

This is what we receive when the above program is used with the -asan option:

==110214==ERROR: AddressSanitizer: use-after-poison on address 0x40c0007ff7f8 at pc 0x0000004f89a8 bp 0x000000000000 sp 0x10c000199f40
WRITE of size 8 at 0x40c0007ff7f8 thread T0
#0 0x4f89a7 (/tmp/go-build2293331005/b001/exe/panic+0x4f89a7)

Address 0x40c0007ff7f8 is a wild pointer.
SUMMARY: AddressSanitizer: use-after-poison (/tmp/go-build2293331005/b001/exe/panic+0x4f89a7)
Shadow bytes around the buggy address:
0x0818800f7ea0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7eb0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7ec0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7ed0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7ee0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
=>0x0818800f7ef0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7[f7]
0x0818800f7f00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7f10: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7f20: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7f30: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0x0818800f7f40: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==110214==ABORTING
exit status 1

There might be certain situations when you need to access a value stored in the arena even after this has been freed. The arena.Clone makes a shallow copy of the value allocated in the arena and put it into the heap.

There is an important notice to remember: memory arenas are not safe to use concurrently!

Testing arenas performance

For the the purpose of our testing I’ve created two types of benchmarks: one with simple allocations and another one simulating the allocation of a complex data structure. I’ve created a generic struct with two components: a slice and a length. The length component is not important. Then I’ve tested both the standard and the arena allocation running on multiple test cases. I won’t show here all the test cases, you can check them under the following link: https://github.com/esimov/go-arena.

The most interesting results I’ve obtained when I benchmarked the performance of creation of the above mentioned struct and populating its slice component iteratively in a for loop n number of times. Let’s see the code.

type Struct[T any] struct {
len int
data []T
}

var testCases = []int{100, 10_000, 100_000}

func BenchmarkComplexStruct_IterNoArena(b *testing.B) {
for _, n := range testCases {
b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
for x := 0; x < 1000; x++ {
obj := Struct[int]{
len: n,
data: make([]int, 0, n),
}

for i := 0; i < b.N; i++ {
for i := 0; i < n; i++ {
obj.data = append(obj.data, i)
}
}
}
})
}
}

func BenchmarkComplexStruct_IterArena(b *testing.B) {
for _, n := range testCases {
b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
mem := arena.New()
defer arena.Free(mem)

for x := 0; x < 1000; x++ {
obj := arena.NewAlloc[Struct[int]](mem)
obj.len = n
obj.data = arena.MakeSlice[int](mem, 0, n)

for i := 0; i < b.N; i++ {
for i := 0; i < n; i++ {
obj.data = arena.Append(mem, obj.data, i)
}
}
}
})
}
}

And here are the results:

BenchmarkComplexStruct_IterNoArena
BenchmarkComplexStruct_IterNoArena/n=100
BenchmarkComplexStruct_IterNoArena/n=100-16 460 2714359 ns/op 4442468 B/op 39 allocs/op
BenchmarkComplexStruct_IterNoArena/n=10000
BenchmarkComplexStruct_IterNoArena/n=10000-16 4 255372797 ns/op 366595744 B/op 1779 allocs/op
BenchmarkComplexStruct_IterNoArena/n=100000
BenchmarkComplexStruct_IterNoArena/n=100000-16 1 2197594887 ns/op 802817536 B/op 1012 allocs/op
BenchmarkComplexStruct_IterArena
BenchmarkComplexStruct_IterArena/n=100
BenchmarkComplexStruct_IterArena/n=100-16 240 4781351 ns/op 1712813 B/op 4 allocs/op
BenchmarkComplexStruct_IterArena/n=10000
BenchmarkComplexStruct_IterArena/n=10000-16 3 482001029 ns/op 192949549 B/op 359 allocs/op
BenchmarkComplexStruct_IterArena/n=100000
BenchmarkComplexStruct_IterArena/n=100000-16 1 4448649554 ns/op 838895368 B/op 1109 allocs/op

As we have intuited the arena outperformed significantly the standard memory allocation both in terms of allocations/operations and bytes/operations. Though one interesting result got my attention, specifically when we tested the arena allocation with a slice with a capacity of 100000. Somehow surprisingly the results are a little worse than the results obtained from the standard allocations. The explication is coming from checking the arena source code. In the runtime/arena package we have the following on line 351:

Alloc reserves space in the current chunk or calls refill and reserves space in a new chunk.

What does this means? It means that the arena pre-allocates memory space in chunks. If the data does not fits in the current chunk then it reserves space in a new chunk. Then further down under the refill() method , which is invoked within the alloc(), we get the following:

Refill inserts the current arena chunk onto the full list and obtains a new
one, either from the partial list or allocating a new one, both from mheap.

I assume that this is the reason why on the last benchmark with a slice having a capacity of 100000 this kind of re-allocation is happening more frequently.

Conclusions

Arenas are useful in performance critical applications, where the manual memory management could lower the GC overhead significantly, but exactly for this reason we should be extra cautious.

While arenas can provide many performance benefits, we have to be aware about the tradeoffs also. Let’s enumerate a few of them:

  • Even if the memory management is handled manually, this does not mean, that heap allocations and GC activations is not happening at all.
  • Manual memory management could lead to potential memory leaks and errors, so it does require a deeper understanding of what’s happening underneath.
  • Attempting to access an object after the arena was freed can cause a program to crash.
  • Taking into consideration that it’s still an experimental feature, it’s better to avoid in production.

--

--

Endre Simo

Software developer, envision ideas and crafting things mostly around graphics, image processing, computer vision and distributed systems. https://esimov.com