Why I encountered Go memory fragmentation? How did I resolve it?

Hiroki Sakamoto
6 min readFeb 23, 2023

--

The Go gopher was designed by Renée French. Illustrations by tottie. ©tottie / Renée French https://github.com/tottie000/GopherIllustrations/blob/main/Gopher_Illustrations/scuffle.png

Introduction

If you are a Go developer and you’ve ever encountered the problem that your process memory usage doesn’t decrease even when your objects are definitely freed, This post is for you.
In this blog, I’ll explain why I encountered such a problem, how I investigated it, and how I finally solved it!
If you’d like to know the conclusion first, please read the conclusion section before diving into the details.

Background

I’m part of a team that is building and running a Prometheus-compatible in-memory time-series database.
This database has a data structure called a “chunk”, which is 4 hours of data points and corresponds to a unique key-value pair of labels like this.

{host="host1", env="production"}

You can regard a data point as a pair consisting of a timestamp and a value so a chunk contains some data points in 4 hours.

It keeps 8 chunks at a time and purges the oldest chunk every 4 hours for each unique pair metric.
Since it’s an in-memory database, it also has snapshot recovery logic to prevent it from losing data after a process restart.

What’s the problem I encountered?

I noticed that the memory usage was increasing continuously for 32 ~ 36 hours after restarting.
Moreover, the increase was really sharp.

The 1st investigation — Go pprof

At first, I suspected a memory leak problem, so I took heap profiles every hour or so and checked the diffs.
Go pprof is a really wonderful debugging tool, but it wasn’t working well at the time.
I checked if chunks had been exactly freed, if it was holding unused objects for a long time, and so on, but I couldn’t get any clues.
The total heap usage was even the same as for unproblematic nodes.
This was a mystery…
Why was the process memory usage growing even though the total heap usage remained the same?

The 2nd investigation — Go memstats metrics

I’d noticed that some memory fragmentation happened by go memstats metrics.

go_memstats_heap_inuse_bytes{…} - go_memstats_heap_alloc_bytes{…}

This metric and the result show that the allocated bytes are really less compared to the heap in use bytes.
This means that there was a lot of allocated space, but it wasn’t used effectively.
Normally, this value is increased every 4 hours because some chunks expire, but the value is gradually decreased.
However, this wasn’t the case on the problematic nodes, and the value wasn’t decreasing at all.
That’s why I suspected that it can use expired space for newly ingested data for non-restarted nodes, but not for restarted nodes due to memory fragmentation.

Then I turned my attention to its recovery logic from a snapshot.
A snapshot actually consists of bytes from chunks and is put on a file.
The logic put chunks bytes concurrently, so the order is random.
This is a performance optimization for writing.
Meanwhile, the bytes are read from the head in the recovery logic.
It makes it allocate chunks on the heap in a random order so if some scattered bytes expire every 4 hours, we can easily imagine a lot of memory fragmentation happens.
Given that data points are normally ingested in chronological order, I felt this hypothesis was legitimate.

That’s why I fixed the logic of writing a snapshot.
Namely, I changed it to sort chunk bytes according to each chunk’s timestamps before actually writing them to a file.
This way it can allocate the bytes in chronological order because it reads the bytes and allocates them from the head in the recovery time.
So I imagined the fixed allocation like this.

I felt this is the best ingenuity and I felt I’m one of the best Go developers…

As a result, this made the problem worse!!
It just caused the write performance degradation because of the sorting without any success.

What a nightmare it is!
What’s going on…

The 3rd investigation — Understand Go heap management

It was time for me to understand Go heap management exactly.
I’d learned that from the awesome materials and finally, I came upon the epiphany.
https://www.sobyte.net/post/2022-04/golang-memory-allocation/
https://deepu.tech/memory-management-in-golang/

Simply put, Go runtime manages the heap as a lot of “mspans”.
Each mspans consists of a certain number of contiguous 8KB memory pages and corresponds to a “size class”.
Each size class dictates what size of objects are allowed there, which can accommodate a given object size, and is the minimum size to allocate.
https://go.dev/src/runtime/sizeclasses.go

Let’s say, if you’d like to allocate a 100B object, then a 112bytes size class is chosen.

I found the size class most important.
Normally, each chunk has a byte array for its actual data internally and the bytes are created like this.

make([]byte, 0, 128)

This slice always grows based on this value 128 bytes according to this formula when the length reaches the cap.
This is because Go’s slice internally rebuilds the array and points the pointer of the new array when it exceeds its own cap.
So, the allocation should be like this.

The most important thing is that fixed size-classes are always chosen to allocate.

Meanwhile, restored chunks were created like this.

make([]byte, 0, actual chunk byte size)

What does it mean?
The size classes normally ingested chunks belong and the ones restored chunks are absolutely different!
As chunk sizes are different among them, they are really distributed over the size classes.
(You could imagine scraping interval differences)
This is the reason why the sorting bytes solution didn’t work.

Then, I tried to set 128 bytes as the cap.
This solution worked fine but snapshots with randomly ordered chunk bytes still kept memory fragmentation.
Therefore, additionally, I merged a patch to make chunk bytes in a snapshot in chronological order.
Finally, I could solve this problem completely.

go_memstats_heap_inuse_bytes{…} — go_memstats_heap_alloc_bytes{…}

Conclusion

  • Go manages the heap separating it into mspans
  • A mspan consists of a certain number of contiguous 8KB pages
  • Each mspans corresponds to each size class, which dictates what size of objects are allowed to allocate there
  • To avoid memory fragmentation on Go runtime, you need to take care of both size classes and temporal locality.

--

--

Hiroki Sakamoto

Software Engineer focused on distributed systems and time-series database https://github.com/taisho6339