# Go: How Does the Goroutine Stack Size Evolve?

Jun 1, 2019 · 8 min read

Go provides a light and smart goroutines management. Light because the goroutine stack starts at 2Kb only, and smart since goroutines can grow / shrink automatically according to our needs.

Regarding the size of the stack, we can find it in `runtime/stack.go` :

`// The minimum size of stack used by Go code_StackMin = 2048`

We should note that it has evolved through the time:

• Go 1.2: goroutine stack has been increased from 4Kb to 8Kb.
• Go 1.4: goroutine stack has decreased from 8Kb to 2Kb.

The stack size has moved due to the strategy of the stack allocation. We will go back to this topic later in this article.

This default stack size is sometimes not enough to run our program. This is when Go automatically adjusts the size of the stack.

# Dynamic stack size

If Go can automatically grow the size of the stack, it is also able to determine that the allocation size will not have to change. Let’s take an example and analyze how it works:

`func main() {   a := 1   b := 2   r := max(a, b)   println(`max: `+strconv.Itoa(r))}func max(a int, b int) int {   if a >= b {      return a   }   return b}`

This first example just calculates the higher number among 2 integers. In order to know how Go manages the allocation of the goroutine’s stack, we can look at the Go’s assembler code with the command: `go build -gcflags -S main.go`. The output — I just left the lines that are related to the stack allocation — give us some interesting lines that can show what Go is doing:

`"".main STEXT size=186 args=0x0 locals=0x70   0x0000 00000 (/go/src/main.go:5)    TEXT   "".main(SB), ABIInternal, \$112-0   [...]   0x00b0 00176 (/go/src/main.go:5) CALL   runtime.morestack_noctxt(SB)[...]0x0000 00000 (/go/src/main.go:13)    TEXT   "".max(SB), NOSPLIT|ABIInternal, \$0-24`

There are two instructions that involves the stack changes:
- `CALL runtime.morestack_noctxt`: this method will increase the size of the stack if it needs more.
-`NOSPLIT`: this instruction means that the stack overflow check is not needed. It is similar to the compiler directive `//go:nosplit`.

If we look at the method `runtime.morestack_noctxt`, it will call the method `newstack` from `runtime/stack.go`:

`func newstack() {   [...]   // Allocate a bigger segment and move the stack.   oldsize := gp.stack.hi - gp.stack.lo   newsize := oldsize * 2   if newsize > maxstacksize {       print("runtime: goroutine stack exceeds ", maxstacksize, "-byte limit\n")      throw("stack overflow")   }   // The goroutine must be executing in order to call newstack,   // so it must be Grunning (or Gscanrunning).   casgstatus(gp, _Grunning, _Gcopystack)   // The concurrent GC will not scan the stack while we are doing the copy since   // the gp is in a Gcopystack status.   copystack(gp, newsize, true)   if stackDebug >= 1 {      print("stack grow done\n")   }   casgstatus(gp, _Gcopystack, _Grunning)}`

The size of the current stack is first calculated from the boundaries`gp.stack.hi` and `gp.stack.li` that are pointers to the beginning and end of the stack:

`type stack struct {   lo uintptr   hi uintptr}`

Then the current size is multiplied by 2 and checked if it does not exceed the max allowed size — that size depends on the architecture:

`// Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.// Using decimal instead of binary GB and MB because// they look nicer in the stack overflow failure message.if sys.PtrSize == 8 {   maxstacksize = 1000000000} else {   maxstacksize = 250000000}`

Now that we know the behavior, we can write a simple example to verify all of that. In order to debug, we will set the constant `stackDebug` that we have seen in the`newstack` method to 1 and run:

`func main() {   var x [10]int   a(x)}//go:noinlinefunc a(x [10]int) {   println(`func a`)   var y [100]int   b(y)}//go:noinlinefunc b(x [100]int) {   println(`func b`)   var y [1000]int   c(y)}//go:noinlinefunc c(x [1000]int) {   println(`func c`)}`

The instruction `//go:noinline` will avoid inlining all functions in the main function. If the inlining is done by the compiler, we will not see the dynamic growth of the stacks in each function prolog.

Here is a part of the debug we got:

`runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]stack grow donefunc aruntime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]stack grow doneruntime: newstack sp=0xc00003f888 stack=[0xc00003e000, 0xc000040000]stack grow doneruntime: newstack sp=0xc000081888 stack=[0xc00007e000, 0xc000082000]stack grow donefunc bruntime: newstack sp=0xc0000859f8 stack=[0xc000082000, 0xc00008a000]func c`

We can see that the stack has grown 4 times. Indeed, the function prolog will grow the stack as much as necessary to fit with the needs. As we have seen in the code, the stack size is defined by the boundaries of the stack, so we can calculate the new stack size in each case — the instruction `newstack stack=[...]`provides the pointers of the current stack boundaries:

`runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]0xc00002e800 - 0xc00002e000 = 2048runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]0xc000077000 - 0xc000076000 = 4096runtime: newstack sp=0xc00003f888 stack=[0xc00003e000, 0xc000040000]0xc000040000 - 0xc00003e000 = 8192runtime: newstack sp=0xc000081888 stack=[0xc00007e000, 0xc000082000]0xc000082000 - 0xc00007e000 = 16384runtime: newstack sp=0xc0000859f8 stack=[0xc000082000, 0xc00008a000]0xc00008a000 - 0xc000082000 = 32768`

The investigation in the internals did show us that the stack of a Goroutine starts a 2Kb and increased as much as necessary in the function prolog, added at the compilation, till the memory is enough or the limit of the stack is reached.

# Stack allocation management

The dynamic allocation system is not the only point that could impact our applications. The way it is allocated could have a great impact as well. Let’s try to understand how it is managed from the full trace of the two first stack growths:

`runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]copystack gp=0xc000000300 [0xc00002e000 0xc00002e6e0 0xc00002e800] -> [0xc000076000 0xc000076ee0 0xc000077000]/4096stackfree 0xc00002e000 2048stack grow doneruntime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]copystack gp=0xc000000300 [0xc000076000 0xc000076890 0xc000077000] -> [0xc00003e000 0xc00003f890 0xc000040000]/8192stackfree 0xc000076000 4096stack grow done`

The first instruction shows the address of the current stack,`stack=[0xc00002e000, 0xc00002e800]` and will copy it to a new one twice as big,`copystack [0xc00002e000 [...] 0xc00002e800] -> [0xc000076000 [...] 0xc000077000]` , 4096 bits length as we have seen previously. Then the previous stack is now freed: stackfree `0xc00002e000`. Here is a schema that could help to visualize what is happening:

The instruction `copystack` copies the entire stack and will move all addresses to this new stack. We can verify that easily with the small modification of your code:

`func main() {   var x [10]int   println(&x)   a(x)   println(&x)}`

It now prints the address of the value:

`0xc00002e738[...]0xc000089f38`

The address `0xc00002e738` is contained in the first stack address we saw `stack=[0xc00002e000, 0xc00002e800]`, while `0xc000089f38` is included in the last stack boundaries `stack=[0xc000082000, 0xc00008a000]`that we have in the debug trace. It confirms that all values have been moved from stack to stack.

Also, it is interesting to note that the stack will shrink, if needed, when the garbage collection is triggered.
In our example, after the function call, there is no other valid frames than the main one in the stack, so the system will be able to shrink it if the garbage collector runs. For that, we can just force the garbage collector to run:

`func main() {   var x [10]int   println(&x)   a(x)   runtime.GC()   println(&x)}`

The debug trace now displays the shrink of the stack:

`func cshrinking stack 32768->16384copystack gp=0xc000000300 [0xc000082000 0xc000089e60 0xc00008a000] -> [0xc00007e000 0xc000081e60 0xc000082000]/16384`

As we can see, the stack size has been divided by 2 and re-used a previous stack address `stack=[0xc00007e000, 0xc000082000]`. Here again we can see in the `runtime/stack.go — shrinkstack()` that the shrink always divides the current size by 2:

`oldsize := gp.stack.hi - gp.stack.lonewsize := oldsize / 2`

# Contiguous stack VS segmented stack

The strategy to copy the stack into a bigger space is called contiguous stack as opposed to segmented stack. Go has moved to a contiguous stack in Go 1.3. In order to see the difference, we will run the same example with Go 1.2. Here again, we will need to update the constant `stackDebug` to display the trace. For that, since the runtime was written in C for this version, we will have to compile the source . Here is the result:

`func aruntime: newstack framesize=0x3e90 argsize=0x320 sp=0x7f8875953848 stack=[0x7f8875952000, 0x7f8875953fa0]   -> new stack [0xc21001d000, 0xc210021950]func bfunc cruntime: oldstack gobuf={pc:0x400cff sp:0x7f8875953858 lr:0x0} cret=0x1 argsize=0x320`

The current stack `stack=[0x7f8875952000, 0x7f8875953fa0]` is 8Kb in length (8192 bytes + the size of the top of the stack) and the new stack created is 18864 bytes (18768 bytes + the size of the top of the stack). The memory to be allocated is the following:

`// allocate new segment.framesize += argsize;framesize += StackExtra;   // room for more functions, Stktop.if(framesize < StackMin)   framesize = StackMin;framesize += StackSystem;`

For the constants,`StackExtra` is set to 2048, `StackMin` is set to 8192, and `StackSystem` is set to a minimum of 0 till more than 512.
So, our new stack is composed as: 16016 (frame size) + 800 (arguments) + 2048 (StackExtra) + 0 (StackSystem).

Once all the functions are called, the new stack is now freed (log`runtime: oldstack` ). This behavior was one of the reasons that pushed Golang team to move to a contiguous stack:

Current split stack mechanism has a “hot split” problem — if the stack is almost full, a call will force a new stack chunk to be allocated. When that call returns, the new stack chunk is freed. If the same call happens repeatedly in a tight loop, the overhead of the alloc/free causes significant overhead

Go had to increase the minimum of the stack in 1.2 to 8Kb for this reason and was later able to reduce it back to 2Kb after the implementation of the contiguous stack.

Here is an update of our previous graph with the segmented stack:

# Conclusion

The stack management by Go is efficient and quite easy to understand. Golang is not the only language that has chosen to not use the segmented stack, Rust has also decided to not go for this solution for the same reasons.

If you want to go deeper into the stack details, I also suggest you read the blog post by Dave Cheney that talks about the redzone, along with the post from Bill Kennedy that explains the frames in the stack.

## A Journey With Go

A Journey With Go Language Programming

Written by

## Vincent Blanchon

French Gopher in Dubai

## A Journey With Go

A Journey With Go Language Programming

Written by

## Vincent Blanchon

French Gopher in Dubai

## A Journey With Go

A Journey With Go Language Programming

## Cheat Sheet for OpenCV — All you want to know (2021 edition)— Part 1

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app