Heap Optimizations for Go Systems

Nishant Roy
The Startup
Published in
10 min readSep 15, 2020

Introduction

Despite its growing popularity as a systems language, Go programs are susceptible to severe performance regressions at large scale. In systems with high memory usage, garbage collection (GC) can cause performance regressions by cannibalizing resources from the main program. Heavy GC cycles can add hundreds of milliseconds of latency to a request, resulting in degraded user experience.

The goal of this post is to help you understand:

  • How Go GC works at a high level? Why would it impact your system’s performance?
  • What causes GC pressure (more resources spent on GC)?
  • How to determine if GC pressure is the cause of your performance problems?
  • How to measure and profile your program’s heap usage?
  • How to identify which part of the code is the culprit?
  • What are some steps you can take to lower heap usage and GC pressure?

How does garbage collection work in Go?

Go does not require manual memory management i.e., you do not need to manually allocate memory and clear it once you’re done using it. Such functionalities are abstracted away from the user, to minimize errors that could lead to memory leaks. Go has a built-in garbage collector, which reclaims memory once it is no longer in use by the program.

Go uses a concurrent, tricolor mark-and-sweep algorithm for garbage collection. This algorithm allows GC to run concurrently with the mutator (main program), without a stop-the-world pause — when all goroutines are paused while GC runs. Go GC also aims to utilize no more than 25% of available CPU. Both these features are highly beneficial since it leaves plenty of resources for the mutator to continue running without significant impact to performance (throughout, latency, etc.)

The algorithm works by dividing objects on the heap into three sets (colors) during the mark phase:

  • White = collectible since it’s not in use in memory
  • Black = not collectible since it’s definitely in use in memory
  • Grey = might be collectible, not determined yet

As the number of objects on the heap increases, it increases the time spent in the marking phase. Later, this collectible memory is reclaimed during the sweep phase. Sweeping occurs when a goroutine attempts to allocate new objects in memory.

Why would it impact system performance?

If the rate of memory allocation in the mutator (main program) is very high, then Go GC will start to “steal” more goroutines from the mutator to assist with the marking phase. This has two effects — firstly, it speeds up the GC process by providing more resources, and secondly, it takes away resources from the mutator, which slows down the rate of memory allocation. This is important to ensure that the rate of memory allocation does exceed the rate of memory cleanup, which could cause the heap to grow out of control, potentially resulting in out-of-memory crashes.

When the garbage collector starts to steal resources from the main program, it can start to have a significant impact on the performance of the main program, since CPU resources are limited. This typically manifests in the form of “tail latency”, i.e, the higher percentiles of latency (p99, p999, etc.) compared to the average latency, and can have an adverse effect on user experience. A user will remember the worst or slowest experiences more than the average request, and this can cause user dissatisfaction.

Therefore, it’s important to understand the details of memory management and how it can impact your system’s performance and your users’ experience. This writeup explains how to diagnose whether GC pressure is the root cause of your system’s performance problems, identifying which part of the code is responsible, and some steps you can take to address the problem.

What goes on the heap?

Usually, the heap includes explicit pointers to structs and sub-fields. Strings and byte arrays are also considered pointers even if the code does not explicitly mark them as pointers, since they are treated as pointers under the hood in Go.

Go uses a technique called “escape analysis” to determine what it needs to store on the heap vs the stack. At a high-level, if an object is only referenced within the scope of a particular function call, it can be allocated to the stack for that function. The stack is cleared once the function is done executing, and we lose that object forever. If an object is needed outside that function, it needs to be allocated on the heap so it is accessible later on.

Here’s a blog post that explains escape analysis in more detail.

How to determine if GC pressure is the cause of your performance problems?

Intuition can help guide you in the right direction. As mentioned above, GC pressure typically results in high tail latency, so if you observe such symptoms, then GC pressure could be the root cause, especially if you know that your program has high memory usage.

To confirm your hypothesis, you can leverage the Go runtime environment variable GODEBUG. By setting GODEBUG=gctrace=1 when running your program, you can force your program to output debug logs for each GC cycle, detailing the time spent on the various GC phases.

If you find that your system’s performance metrics (such as spikes in latency) align with the times of the GC cycles, it’s highly likely that the GC cycles are the cause of your performance regressions.

This blog post from Ardan Labs explains how to use gctrace, and how to read the output.

How to measure your program’s heap usage?

Go comes with multiple built-in tools to help diagnose your program’s heap usage. I will primarily focus on two:

Memstats

The runtime.Memstats library exposes statistics about the system’s memory usage, GC, etc. We can use this library to monitor the the total number of objects on the heap, which we can use as our indicator of success, i.e., once we start making heap optimizations, we expect this metric to drop.

From the source code, we see that the HeapObjects field provides us with the relevant data.

// HeapObjects is the number of allocated heap objects.
//
// Like HeapAlloc, this increases as objects are allocated and
// decreases as the heap is swept and unreachable objects are
// freed.
HeapObjects uint64

This blog post provides an example of how to access Memstats data and print it out.

pprof

The pprof package can be used to generate a heap profile of your program, and identify the stacktrace for object allocation to see which sections of your code are allocating a large number of heap objects. The pprof CLI also has a method to break down a particular function line-by-line, to identify exactly which line of code is the culprit, and focus our efforts accordingly.

The package documentation describes how to expose profiling data as an HTTP endpoint from your program, and examples of commands to run in your terminal to access the profile. Specifically, for heap profiles:

go tool pprof [options] http://localhost:6060/debug/pprof/heap// Available options
-inuse_space Display in-use memory size
-inuse_objects Display in-use object counts
-alloc_space Display allocated memory size
-alloc_objects Display allocated object counts

This downloads the profile data to your machine as a .pb.gz file, and puts you in an interactive command-line to start visualizing the data. By running the “help” command, you can see all the available options. My personal preference to visualize profile data is to run the following command in a new terminal to open the profile data in an interactive web browser:

go tool pprof -http=localhost:<port> /path/to/profile.pb.gz

In the top left corner, select “View” > “Flame Graph” and then “Sample” > “inuse_objects” to see a flamegraph of the number of objects allocated. This provides a quick and easy way to visualize which function calls are allocating a large number of objects.

Example flamegraph

Once you have identified some functions that are contributing a large number of objects, you can go back to the CLI that we entered when we first pulled the profile data. If you lost that window, you can pull it up again by running:

go tool pprof -inuse_objects /path/to/profile.pb.gz

In here, we can use the list command to see how many objects are allocated by each line of a method. Run list <YourMethodName> to print out the data in your terminal:

This produces a line-by-line output, like the following. In this case, we can see that lines 233 and 237 create new objects resulting in a large number of heap allocations, while line 241 which adds the objects to a map also causes a large number of heap allocations.

(pprof) list createCatalogMap
Total: 132263423
ROUTINE ======================== <CODE_PATH>
105268459 105268459 (flat, cum) 79.59% of Total
. 63815675 233: product := BuildProduct(productID, productPrice, productSellerID)
. . 234: if productPrice < minProductPrice {
. . 235: minProductPrice = productPrice
. . 236: }
. 20726392 237: catalogListing := catalogs.CreateListing(product, contextFeatures)
. . 238:
. . 239: // Create listing key in string format by concatenating base64 encodings of the productID, sellerID, and catalog version
. . 240: catalogListingKey := catalogs.CreateListingKey(productID, sellerID, catalogListing.GetVersion())
20726392 20726392 241: catalogMap[catalogListingKey] = catalogListing
. . 242: return catalogMap

Here’s a blog post with further examples of how to use the pprof web UI

What are some ways to reduce the resources spent on GC?

As previously discussed, one of the primary factors resulting in expensive Garbage Collection is the number of objects on heap. By optimizing our code to reduce the number of long-lived objects on heap, we can minimize the resources spent on GC, and improve our system performance.

Here are some suggestions on how to do so:

Reduce the number of long-living objects

Rather than having objects live on the heap, they can be created as values rather than references on demand. For instance, if we need some data for each item in a user request, rather than precomputing and storing it in a long-lived map, we can compute it on a per-request basis to reduce the number of objects on the heap.

Remove pointers within pointers

If we have a reference to an object, and the object itself contains further pointers, these are all considered individual objects on the heap even though they may be nested. By changing these nested values to be non-pointers, we can reduce the number of objects to be scanned.

Avoid unnecessary string/byte array allocations

Since strings/bytes arrays are treated as pointers under the hood, each one is an object on the heap. If possible, try to represent these as other non-pointer values such as integers/floats, time.Time for dates, etc.

Looking at the createCatalogMap example from above, if we replace our map key which was previously a string with a struct containing the IDs instead, we see the number of heap objects drop by ~26 million (20%).

(pprof) list createCatalogMap
Total: 106261986
ROUTINE ======================== <CODE_PATH>
34768 84576835 (flat, cum) 79.59% of Total
. 63815675 233: product := BuildProduct(productID, productPrice, productSellerID)
. . 234: if productPrice < minProductPrice {
. . 235: minProductPrice = productPrice
. . 236: }
. 20726392 237: catalogListing := catalogs.CreateListing(product, contextFeatures)
. . 238:
. . 239: structKey := CatalogKeyStruct{
. . 240: ProductID: productID,
. . 241: SellerID: productSellerID,
. . 242: CatalogVersion: catalogListing.GetVersion(),
. . 243: }
34768 34768 244: catalogMap[structKey] = catalogListing
. . 245: return catalogMap

Object pooling

If your program tends to create a large number of short-lived objects in bursts, you may benefit from object pools, which can be used to allocate and free memory blocks manually. This can reduce the number of GC cycles needed, since the pool retains these objects for a longer scope, so we don’t need to keep allocating and cleaning up these objects.

Note: Object pools can cause memory leaks if not used properly, so this is only recommended if you know what you’re doing

Clean up unused fields

Basic types in Go have default values (i.e., bool defaults to false, int defaults to 0, etc.), so if you have unused fields of these types, they are still consuming memory. An unused field is one that is no longer being read, used for online request serving, or offline logging. Removing such a field should be a no-op.

By removing these fields, we lower the program’s memory usage, and make the code more readable and easy to understand.

Migrate data off the heap

If we remove data from the heap, we drastically reduce the amount of work that the garbage collector needs to do. One option here is to migrate the data to an external source (for example, in a microservice architecture, we may have a performant key-value store that can be leveraged for such a use case). It’s important to consider the additional overhead of making a request to fetch the data from this external source.

Another option is to leverage an open-source Go package to store data off the heap, but still within our system’s memory. Here is one such package.

Note: Using off-heap storage can cause problems if not used properly, so this is only recommended if you know what you’re doing

Rearrange your structs for lower memory usage

The Go memory allocator does not optimize for data structure alignment, so by rearranging the fields of your struct, you can lower your memory usage. For example, if we consider the following two objects, the GoodObject uses 16 bytes while the BadObject uses 24 bytes due to more optimal data alignment. The two bools are allocated on the same line in memory, as opposed to the first line having just one bool and padding, the second line being the int64, and then the last line again having one bool and padding.

Go Playground Linktype BadObject struct {
A bool
B int64
C bool
}
type GoodObject struct {
A bool
C bool
B int64
}

Conclusion

The Go garbage collector is highly optimized for most use cases and developers mostly do not need to worry about the details of its implementation and performance. However, for some heavy use cases, the garbage collector may cause significant impact to program performance.

Hopefully this blog posts helps you understand how to leverage the tools provided by Go to diagnose performance regressions caused by heavy memory usage, and provides you with some ideas of how to optimize your system to minimize the impact of garbage collection.

--

--