Memory management in Go

Ali Can
10 min readMar 6, 2022

--

How memory works in Go and some tips to get the best out of your memory.

Memory Management is the process of managing computer memory, moving processes between main memory and disk to boost the overall performance of the system. The essential requirement of memory management is providing ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when no longer needed.

Programs manage their memory by dividing it into different parts that perform specific tasks. Stack and heap are two of these parts which manage the program’s unused memory and allocate it for different kinds of data. When the program no longer needs the memory, it may deallocate it. Deallocated memory is returned to its source and is made available for reuse.

Understanding the classical memory model

memory layout of a running process

A running program maintains its data in one of these logical regions of memory. The OS allocates the memory for global and static variables when it loads the program into memory and doesn’t deallocate it until the program terminates. These values don’t change unless the program explicitly changes them. The other two regions, the stack and heap, are more dynamic. In these regions memory is allocated and deallocated by the program as needed. The differences between these two regions are how they behave.

Text

A text segment is one of the sections of a program in an object file or in memory, which contains executable instructions. A text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.

Initialized data and uninitialized data segment

A data segment is a portion of the virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.

Stack

The stack is used for static memory allocation and just like the data structure, it follows a last in first out(LIFO) approach. Typically, functional parameters and local variables are allocated on the stack. Nothing really moves physically when data is pushed on to or popped off of a stack in memory. Only the values stored in the memory managed by the stack are changed. This makes the process of storing and retrieving data from the stack very fast since there is no lookup required. We can just store and retrieve data from the top most block on it. Any data that is stored on the stack has to be finite and static. This means the size of the data is known at compile time. Memory management of the stack is simple and straightforward and is done by the OS. Because the size of the stack is limited we can encounter stack overflow errors here.

Heap

The name heap has nothing to do with the heap data structure. Heap is used for dynamic memory allocation. Whereas the stack only allows allocation and deallocation at the top, programs can allocate or deallocate memory anywhere in a heap. The program must return memory to the stack in the opposite order of its allocation. But the program can return memory to the heap in any order. This means heap is more flexible than the stack. Pointers, arrays and big data structures are usually stored in Heap.

Data stored on the heap must form a contiguous block large enough to satisfy the request with a single chunk of memory. This property increases the complexity of the heap. First, the code that carries out the allocation operation must scan the heap until it finds a contiguous block of memory that is large enough to satisfy the request. Second, when memory is returned to the stack, adjacent freed blocks must be coalesced to better accommodate future requests for large blocks of memory. This increased complexity means that managing memory with heap is slower than with a stack.

Heap memory allocation scheme does not provide automatic de-allocation. We need to use a Garbage Collector to remove the unused objects in order to use the memory efficiently.

Stack vs Heap

  • Heap is slower compared to stack because the process of looking up data is more involved.
  • Heap can store more data than the stack.
  • Heap stores data with dynamic size; stack stores data with static size.
  • Heap is shared among threads of an application.
  • Heap is trickier to manage because of its dynamic nature.
  • When we talk about memory management we are mostly talking about managing the heap memory.
  • Heap memory allocation isn’t as safe as stack memory allocation because the data stored in this space is accessible or visible to all threads.

Importance of memory allocation

RAM is finite. If a program keeps on consuming memory without freeing it it will run out of memory and crash itself. Therefore, software programs can’t just keep using RAM as they like, it will cause other programs and processes to run out of memory. Due to importance of this, most programming languages (including Go) provide ways to do automatic memory management.

Memory model, the Go way

Go supports automatic memory management, such as automatic memory allocation and automatic garbage collection, which avoids many hidden bugs.

Go memory model

In Go, each thread has its own stack. When we start a thread, we allocate a block of memory to be used as that thread’s stack. The problem comes when this block of memory is not enough. To overcome this we can increase the default size of stack but increasing the size of stack means that the size will increase for every thread. This will be highly inefficient if we want to use lots of threads.

Another option is to decide stack size for each thread individually. Again this will be inefficient in a setup where we have lots of threads since we need to figure out how much memory we should allocate for each stack.

What Go creators came up with is instead of giving each goroutine a fixed amount of stack memory, the Go runtime attempts to give goroutines the stack space they need on demand. This way we don’t need to think about the stack size when we are creating a thread.

A goroutine starts with a 2 kb stack size which can grow and shrink as needed. Go checks whether the amount of stack required for the function it’s about to execute is available, if not a call is made to morestack which allocates a new frame and only then the function is executed. When that function exits, its return arguments are copied back to the original stack frame, and any unneeded stack space is released.

Stack also has an upper limit. If this limit is reached our application will panic and abort.

Go allocates memory in two places: a global heap for dynamic allocations and a local stack for each goroutine. One major difference Go has compared to many garbage collected languages is that many objects are allocated directly on the program stack. Go prefers allocation on the stack. Stack allocation is cheaper because it only requires two CPU instructions: one to push onto the stack for allocation, and another to release from the stack.

Unfortunately not all data can use memory allocated on the stack. Stack allocation requires that the lifetime and memory footprint of a variable can be determined at compile time. If it can’t be determined, a dynamic allocation onto the heap occurs at runtime.

Gophers doing escape analysis

The Go compiler uses a process called escape analysis to find objects whose lifetime is known at compile-time and allocates them on the stack rather than in garbage-collected heap memory. The basic idea is to do the work of garbage collection at compile time. The compiler tracks the scope of variables across regions of code. It uses this data to determine which variables hold to a set of checks that prove their lifetime is entirely knowable at runtime. If the variable passes these checks, the value can be allocated on the stack. If not, it is said to escape, and must be heap allocated.

Whether memory is allocated on the stack or escapes to the heap is entirely dependent on how you use the memory, not on how you declare the variable.

The rules for escape analysis aren’t part of the Go language specification. For Go programmers, the most straightforward way to learn about these rules is experimentation. The compiler will output the results of the escape analysis by building with go build -gcflags '-m'.

Takeaways

  • One heap, and one section for constants and instructions
  • One stack per goroutine
  • Elastic / Dynamic stacks that borrow memory from the heap.
  • Because the heap is limited by the system, in effect, the number of stacks is also limited by the system.
  • Because these stacks don’t have predefined sizes, they make judicious use of memory and allow for tens and hundreds of thousands of goroutines to be started even on a moderately packed computing system.

Garbage Collection

Garbage collection is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced.

Go’s garbage collector is a non-generational concurrent, tri-color mark and sweep garbage collector.

A generational garbage collector focuses on recently allocated objects because it assumes that short lived objects, like temporary variables, are reclaimed most often.

Go compiler prefers allocating objects on the stack, short-lived objects are often allocated on a stack instead of heap; this means that there is no need for generational GC.

Go’s garbage collection works in two phases, the mark phase, and the sweep phase. GC uses the tri-color algorithm to analyze the use of memory blocks. This algorithm first marks objects that are still being referenced as “alive” and in the next phase (sweep) frees the memory of objects that are not alive.

You don’t have to collect your garbage but you can reduce your garbage

One of the primary factors resulting in expensive Garbage Collection is the number of objects on heap. By optimizing our code to reduce the number of long-lived objects on heap, we can minimize the resources spent on GC, and improve our system performance.

Restructure your structs

While reading data, a modern computer’s CPU’s internal data registers can hold and process 64-bits. This is called the word size. It is usually 32-bit or 64-bit.

When we don’t align our data to fit word sizes, padding is added for properly aligning fields in memory so the next field can start at an offset that’s multiple of a word size.

Modern CPU hardware performs reads and writes to memory most efficiently when the data is naturally aligned. Go compiler uses required alignment to make sure the memory that is stored side by side is accessible using a common multiple. Its value is equal to the size of the memory required for the largest field in the structure.

When a struct is created in Go, a contiguous block of memory for it is allocated. The Go memory allocator does not optimize for data structure alignment, so by rearranging the fields of your struct, you can lower your memory usage by lowering padding.

For example, if we consider the following Dog struct,

Let’s calculate the size of the Dog struct.

Age is a 8 bit integer

Hunger is a 64 bit integer

Happy is a boolean, which is 8 bits.

We expect the size of the Dog struct to be 80 bits. In reality it is 192 bits.

Age is 8 bits, but padding is needed since next field is 64 bits and there is no room for it. 8 bit + 56 bit padding

Hunger takes all 64 bit + 0 bit padding

Happy is 8 bits, but it also needs padding to complete the word size.8 bit + 56 bit padding

We have wasted 112 bits.

To optimize our structure we can swap the structure fields that take up less than one machine word in memory. We can arrange them in a descending order according to their sizes so that the adjacent fields will occupy no more than one machine word.

Trick is to arrange your fields in descending order according to their size

Now the size of this struct is 64+8+8=80 bits. With padding, it is 128 bits.

We saved 64 bits.

Reduce the number of long-living objects

Rather than having objects live on the heap, they can be created as values rather than references on demand. For instance, if we need some data for each item in a user request, rather than precomputing and storing it in a long-lived map, we can compute it on a per-request basis to reduce the number of objects on the heap.

Remove pointers within pointers

If we have a reference to an object, and the object itself contains further pointers, these are all considered individual objects on the heap even though they may be nested. By changing these nested values to be non-pointers, we can reduce the number of objects to be scanned.

Avoid unnecessary string/byte array allocations

Since strings/bytes arrays are treated as pointers under the hood, each one is an object on the heap. If possible, try to represent these as other non-pointer values such as integers/floats, time.

Conclusion

We don’t have to worry about every single optimization that exists. Software is all about tradeoffs. Whenever we think we are fixing something, we are also making something harder whether it be readability, time allocation etc. Everything depends on your needs. If you need the optimization then go for it. If you don’t, keep it simple stupid.

--

--