Goroutine leak

Concurrency in Go materializes itself in the form of goroutines (independent activities) and channels (used for communication). While dealing with goroutines programmer needs to be careful to avoid their leakage. They leak if end up either blocked forever on I/O like channel communication or fall into infinite loops. Even blocked goroutine consumes resources so the program may use more memory than actually needed or eventually run out of memory and crash. Let’s see a couple of examples when it may happen. Then we’ll focus on how to detect if program is affected by such plague.

Sending to a channel without receiver

Suppose that program sends request to many backends for redundancy. First received response is used and later ones are discarded. The code below will simulate sending requests to downstream servers by waiting for a random number of milliseconds:

package main
import (
"fmt"
"math/rand"
"runtime"
"time"
)
func query() int {
n := rand.Intn(100)
time.Sleep(time.Duration(n) * time.Millisecond)
return n
}
func queryAll() int {
ch := make(chan int)
go func() { ch <- query() }()
go func() { ch <- query() }()
go func() { ch <- query() }()
return <-ch
}
func main() {
for i := 0; i < 4; i++ {
queryAll()
fmt.Printf("#goroutines: %d\n", runtime.NumGoroutine())
}
}
#goroutines: 3
#goroutines: 5
#goroutines: 7
#goroutines: 9

After every call to queryAll number of goroutines grows. The issue is that after receiving first response “slower” goroutines will send to channel without receiver on the other side.

Possible fix is to use buffered channel if number of backend servers is known upfront. Otherwise we could use another goroutine receiving from the channel as long as still there is at least one goroutine still working. Other option might be some mechanism to cancel other requests using context (example).

Receiving from channel without sender

This scenario is similar to sending to a channel without any receiver. Leaking goroutine post contains one example.

nil channels

Writing to nil channel blocks forever:

package main
func main() {
var ch chan struct{}
ch <- struct{}{}
}

so it causes a deadlock:

fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan send (nil chan)]:
main.main()
...

The same happens while reading from nil channel:

var ch chan struct{}
<-ch

It could happen while passing channel which hasn’t been initialized:

package main
import (
"fmt"
"runtime"
"time"
)
func main() {
var ch chan int
if false {
ch = make(chan int, 1)
ch <- 1
}
go func(ch chan int) {
<-ch
}(ch)
    c := time.Tick(1 * time.Second)
for range c {
fmt.Printf("#goroutines: %d\n", runtime.NumGoroutine())
}
}

In this example there is an obvious culprit — if false { but in bigger programs it’s easier to forget and zero value (nil) for channel will be used.

Infinite loops

Goroutine leaks aren’t caused only by wrong use of channels. The reason might be blocking on I/O operations like sending request to API server without timeout. Another option is that program can simply fall into a infinite loop.

Analysis

runtime.NumGoroutine

The simples way is to use value returned by runtime.NumGoroutine.

net/http/pprof

import (
"log"
"net/http"
_ "net/http/pprof"
)
...
log.Println(http.ListenAndServe("localhost:6060", nil))

On http://localhost:6060/debug/pprof/goroutine?debug=1 there will be list of goroutines with their stack traces.

runtime/pprof

To print stack traces of existing goroutines to stdout:

import (
"os"
"runtime/pprof"
)
...
pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)

gops

> go get -u github.com/google/gops

To integrate with your program:

import "github.com/google/gops/agent"
...
if err := agent.Start(); err != nil {
log.Fatal(err)
}
time.Sleep(time.Hour)
> ./bin/gops
12365 gops (/Users/mlowicki/projects/golang/spec/bin/gops)
12336* lab (/Users/mlowicki/projects/golang/spec/bin/lab)
> ./bin/gops vitals -p=12336
goroutines: 14
OS threads: 9
GOMAXPROCS: 4
num CPU: 4

leaktest

It’s one of the approaches to automatically detect leaks in tests. It basically gets stack traces of active goroutines with runtime.Stack at the beginning and at the end of test. If there is some new goroutine after test is done then it’s classified as leakage.


It’s important to analyze goroutines management of even already working programs to avoid leaks which could lead to running out of memory. Such problems usually unveil after the code is running on production for days so it could cause real damage.

Click ❤ below to help others discover this story. If you want to get updates about new posts please follow me.

Resources