Bisecting Go performance degradation

7 min readApr 18, 2019

As you may know from my previous posts I work on language-agnostic real-time messaging server called Centrifugo which is now built on top of library for Go language called Centrifuge.

Recently while testing how Centrifuge library behaves on Go 1.12 I found a massive regression in one performance benchmark — the one that measured message write performance. The regression was almost 2x times.

This is a write up on how I bisected a commit in Go language source code repo that caused such a massive regression. I believe that many developers can come across a similar problem with their code and projects so the flow described here can be useful for future explorers.

Charles Darwin — just a nice first picture for a post and a hint for an impatient reader.

Actually what that benchmark does is calling many write syscalls. Let’s write a simplified version of it which is not Centrifuge specific — the benchmark just writes data to /dev/null:

func BenchmarkWrite(b *testing.B) {
    f, err := os.Create("/dev/null")
    if err != nil {
        panic(err)
    }
    defer f.Close()    buf := make([]byte, 512)    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        f.Write(buf)
    }
    b.StopTimer()
}

We don’t use io.Discard here because we need syscall.

Let’s first install different Go versions. The one where benchmark behaves well:

go get golang.org/dl/go1.11
go1.11 download

And the one where benchmark behaves bad:

go get golang.org/dl/go1.12
go1.12 download

Here we used a cool feature of Go to download binary distributions of language at specific versions — and we now have go1.11 and go1.12 executables ready to use in our terminal.

Let’s run our simple benchmark using Go 1.11 first:

$ go1.11 test -test.run=XXX -bench=BenchmarkWrite -benchmem -memprofile mem.old -cpuprofile cpu.old | tee old.benchgoos: darwin
goarch: amd64
pkg: github.com/centrifugal/centrifugeBenchmarkWrite-8  1000000   1167 ns/op   0 B/op   0 allocs/opPASS
ok   github.com/centrifugal/centrifuge 1.201s

And then using Go 1.12:

$ go1.12 test -test.run=XXX -bench=BenchmarkWrite -benchmem -memprofile mem.new -cpuprofile cpu.new | tee new.benchgoos: darwin
goarch: amd64
pkg: github.com/centrifugal/centrifugeBenchmarkWrite-8  1000000   2184 ns/op   0 B/op   0 allocs/opPASS
ok   github.com/centrifugal/centrifuge 2.226s

Now with a help of benchcmp tool let’s look at performance difference:

$ benchcmp old.bench new.benchbenchmark            old ns/op     new ns/op     delta
BenchmarkWrite-8     1167          2184          +87.15%benchmark            old allocs    new allocs    delta
BenchmarkWrite-8     0             0             +0.00%benchmark            old bytes     new bytes     delta
BenchmarkWrite-8     0             0             +0.00%

So we really see almost 2 times performance degradation between two Go versions.

As you may noticed we saved memory and CPU profiles while running our benchmarks. As we don’t allocate at all here we will only look at CPU profile. Here is one for Go 1.11:

$ go tool pprof cpu.old
> web

Pretty straightforward right? Now let’s look at one for Go 1.12:

$ go tool pprof cpu.new
> web

Wow! Here we see that a lot of additional runtime staff happens in new version of language.

At this moment I realised that something is most probably wrong with Go 1.12 but before raising an issue in Go issue tracker I decided to find a commit where this regression was introduced.

What we want here is run our custom benchmark which is located inside our project against various Go compiler throughout Go repository commit history. We know that regression happened somewhere between Go 1.11 and Go 1.12.

Looking at number of different commits in Go git history:

$ git rev-list --no-walk go1.11..go1.12 | wc -l
1684

– we see 1684 commits! It’s pretty hard to find that commit manually. Also building a Go compiler toolchain from certain commit takes a significant amount of time so we need a more clever way than just sequentially running our bench with Go built from each commit in history.

Turns out that the way exists. You most probably know about bisecting technique. Maybe you already used this to find bug in you Git repository using git bisect command. Here we will use a tool called benchmany to find out where a regression had happened.

First let’s clone Go language repo:

git clone git@github.com:golang/go.git ~/go.dev

As soon as repo cloned we can change directory to our project’s dir (where our target benchmark located). We will run all the following commands from our project’s directory (in my case this is $GOPATH/src/github.com/centrifugal/centrifuge — just use yours).

We have to create a folder for benchmany logs:

mkdir regression_search

Of course you can provide your own name for that dir. And then we can launch benchmany tool:

benchmany -d regression_search -order metric -benchflags '-test.run=XXX -test.bench=BenchmarkWrite' -buildcmd='/Users/ayuemelin/go.dev/bin/go test -c' -C /Users/ayuemelin/go.dev/ go1.11..go1.12

We point benchmany to regression_search directory we just created — it will be used for logs.

The -order argument takes three possible values: seq, spread and metric. The metric value says benchmany to select the commit half way between the pair of commits with the biggest difference in the metric. The metric is ns/opby default which is what we actually need. As said in benchmany docs:

This is like “git bisect”, but for performance.

We also set custom dir where Git operations will be run using -C option and point it to Go source code repo — in this case benchmany is smart enough to recognise that we run benchmark over Go source code repository and will automatically call make.bash script on every iteration. As a result on every bisect iteration we have Go compiler toolchain built from specific commit at /Users/ayuemelin/go.dev/bin/go (just adapt for your username).

The buildcmd option allows to define custom test command so we change it to use our custom Go compiler built from certain commit: /Users/ayuemelin/go.dev/bin/go test -c.

As soon as you run command above you should see sth like this in your terminal:

commit 5e9ad4a, iteration 1/5: building...
commit 5e9ad4a, iteration 1/5: running...
commit 5e9ad4a, iteration 2/5: running...
commit 5e9ad4a, iteration 3/5: running...
commit 5e9ad4a, iteration 4/5: running...
commit 5e9ad4a, iteration 5/5: running...
commit ef21689, iteration 1/5: building...
45/360 runs, 63 unstarted+0 partial+9 done+0 failed commits, ETA 1h28m36s

benchmany tool selects commit, builds Go toolchain using that commit, and then runs benchmark several times. Then selects next commit…

As this tool runs you can look inside regression_search folder and watch how benchmark information appears in bench.log file. You may also notice how it step by step approaches a place where regression happened.

In my case I had to stop benchmany at some point (after about 15 build iterations), take 2 commits which were much closer in time than original range go1.11..go1.12 (but still one commit had a bad performance result and another one had a good performance result) and run benchmany again with a smaller commit scope.

And finally I did one more benchmany run with -order seq option with commit range happened in several hours interval to sequentially go through them and find out the one that caused regression.

Of course the first time I did this it took me several hours of work because I had to do everything from scratch. Now I suppose I can repeat the same in much more reasonable time. And hopefully you will too.

So here is a link to commit that caused such a big regression:

syscall: implement syscalls on Darwin using libSystem · golang/go@a3b0144

There are still some references to the bare Syscall functions in the stdlib. I will root those out in a following CL…

github.com

And here is an issue it relates:

cmd/link: use libsystem_kernel.dylib or libSystem.dylib for syscalls on macOS · Issue #17490 ·…

As I understand it, Go currently has its own syscall wrappers for Darwin. This is explicitly against what Apple…

github.com

It turns out that a way Go used syscalls on Darwin before did not follow a Apple recommendations which state:

Apple does not support statically linked binaries on Mac OS X. A statically linked binary assumes binary compatibility at the kernel system call interface, and we do not make any guarantees on that front. Rather, we strive to ensure binary compatibility in each dynamically linked system library and framework.

Now Go is linking against libSystem.dylib which includes things like libc and libpthread for example and exists on all macOS and iOS boxes.

The sad news is that there is nothing to do actually with this regression. It’s unfortunate but at least Darwin is mostly used for development in Go these days (until some rare scenarios when your main target is Darwin).

The good news for me (and most of Gophers I suppose) is that this does not affect Linux which is the main target system for Centrifugo.

Takeaways:

You need to run your benchmarks on a main target system
You can run benchmark in your project automatically against different Go revisions using benchmany tool
You can bisect simply enough which commit caused regression in your Go code using benchmany -order metric
Sometimes a quick Twitter search can resolve the root cause of problem :)

Bisecting Go performance degradation

syscall: implement syscalls on Darwin using libSystem · golang/go@a3b0144

There are still some references to the bare Syscall functions in the stdlib. I will root those out in a following CL…

cmd/link: use libsystem_kernel.dylib or libSystem.dylib for syscalls on macOS · Issue #17490 ·…

As I understand it, Go currently has its own syscall wrappers for Darwin. This is explicitly against what Apple…

Written by Alexander Emelin