Benchmarking with details

4 min readApr 3, 2019

I’ve been optimizing Go code for a while and trying to improve my benchmarking game.

Let’s take a look at a simple example:

func BenchmarkReport(b *testing.B) {
    runtime.GC()
    for i := 0; i < b.N; i++ {
        r := fmt.Sprintf("hello, world %d", 123)
        runtime.KeepAlive(r)
    }
}

Running go test -bench . will give us the result:

BenchmarkReport-32      20000000               107 ns/op

This might be enough for getting a rough estimate where you stand in terms of performance, but a more detailed output is required for optimization. Condensing everything into a single number is bound to be simplistic.

github.com/loov/hrtime

Let me introduce you to the hrtime package which I wrote to get a more extensive output from benchmarks.

Histogram

The first approach to using it is to use hrtime.NewBenchmark. Rewriting the first example as a regular program.

func main() {
    bench := hrtime.NewBenchmark(20000000)
    for bench.Next() {
        r := fmt.Sprintf("hello, world %d", 123)
        runtime.KeepAlive(r)
    }
    fmt.Println(bench.Histogram(10))
}

Which will print:

avg 372ns;  min 300ns;  p50 400ns;  max 295µs;
  p90 400ns;  p99 500ns;  p999 1.8µs;  p9999 4.3µs;
      300ns [ 7332554] ███████████████████████
      400ns [12535735] ████████████████████████████████████████
      600ns [   18955]
      800ns [    2322]
        1µs [   20413]
      1.2µs [   34854]
      1.4µs [   25096]
      1.6µs [   10009]
      1.8µs [    4688]
        2µs+[   15374]

As we can see that P99 is 500ns, which means that 1% of all our measurements are above 500ns. We can try to optimize this by allocating fewer strings:

func main() {
    bench := hrtime.NewBenchmark(20000000)
    var back [1024]byte
    for bench.Next() {
        buffer := back[:0]
        buffer = append(buffer, []byte("hello, world ")...)
        buffer = strconv.AppendInt(buffer, 123, 10)
        runtime.KeepAlive(buffer)
    }
    fmt.Println(bench.Histogram(10))
}

And the result:

avg 267ns;  min 200ns;  p50 300ns;  max 216µs;
  p90 300ns;  p99 300ns;  p999 1.1µs;  p9999 3.6µs;
      200ns [ 7211285] ██████████████████████▌
      300ns [12658260] ████████████████████████████████████████
      400ns [   81076]
      500ns [    3226]
      600ns [     343]
      700ns [     136]
      800ns [     729]
      900ns [    8108]
        1µs [   15436]
      1.1µs+[   21401]

We can now see that the 99% has gone from 500ns to 300ns.

If you have a keen eye, you may have noticed that Go benchmark gave an average 107ns/op however hrtime gave us 372ns/op. This is the unfortunate side-effect in trying to measure more — it always has an overhead. The final results include this overhead.

Note: depending on the operating-system the overhead can be significantly less and hrtime does support repeated calls inside with https://godoc.org/github.com/loov/hrtime#Histogram.Divide.

Stopwatch

Sometimes you also want to measure concurrent operations. For this there is Stopwatch . https://godoc.org/github.com/loov/hrtime#Stopwatch

Let’s say you want to measure how long does a send take on a highly contended channel. This, of course, is a contrived example, but roughly shows the idea of how we can start the measurement from one goroutine, stop it in another and finally print it all out.

func main() {
    const numberOfExperiments = 1000
    bench := hrtime.NewStopwatch(numberOfExperiments)ch := make(chan int32, 10)
    wait := make(chan struct{})// start senders
    for i := 0; i < numberOfExperiments; i++ {
        go func() {
            <-wait
            ch <- bench.Start()
        }()
    }// start one receiver
    go func() {
        for lap := range ch {
            bench.Stop(lap)
        }
    }()// wait for all goroutines to be created
    time.Sleep(time.Second)
    // release all goroutines at the same time
    close(wait)// wait for all measurements to be completed
    bench.Wait()
    fmt.Println(bench.Histogram(10))
}

hrtesting

Of course, writing a separate binary for all of the tests isn’t that convenient. For that there’s github.com/loov/hrtime/hrtesting, which provides wrappers for testing.B.

func BenchmarkReport(b *testing.B) {
    bench := hrtesting.NewBenchmark(b)
    defer bench.Report()for bench.Next() {
        r := fmt.Sprintf("hello, world %d", 123)
        runtime.KeepAlive(r)
    }
}

It will print out 50%, 90%, and 99% percentiles.

BenchmarkReport-32               3000000               427 ns/op
--- BENCH: BenchmarkReport-32
    benchmark_old.go:11: 24.5µs₅₀ 24.5µs₉₀ 24.5µs₉₉ N=1
    benchmark_old.go:11:  400ns₅₀  500ns₉₀ 12.8µs₉₉ N=100
    benchmark_old.go:11:  400ns₅₀  500ns₉₀  500ns₉₉ N=10000
    benchmark_old.go:11:  400ns₅₀  500ns₉₀  600ns₉₉ N=1000000
    benchmark_old.go:11:  400ns₅₀  500ns₉₀  500ns₉₉ N=3000000

Unfortunately, with Go 1.12 it will print all of the runs of Benchmark instead of just the last. However, with Go 1.13, the output will be much nicer:

BenchmarkReport-32   3174566  379 ns/op  400 ns/p50  400 ns/p90 ...

And comparing results with benchstat will work as well.

hrplot

Leaving the best for last github.com/loov/hrtime/hrplot. Using my experimental plotting package, I decided to add a convenient way to plot the benchmark results.

func BenchmarkReport(b *testing.B) {
    bench := hrtesting.NewBenchmark(b)
    defer bench.Report()
    defer hrplot.All("all.svg", bench)
    
    runtime.GC()
    for bench.Next() {
        r := fmt.Sprintf("hello, world %d", 123)
        runtime.KeepAlive(r)
    }
}

Will create an SVG file called all.svg. It contains a line plot, which shows how much time each iteration took. Second is a density plot, which shows the distribution of timing measurements. And finally a percentiles plot.

Conclusion

Performance optimization is fun, but having tools to help you out makes it even more enjoyable.

Go try out github.com/loov/hrtime and let me know what you think.