Benchmarking with details

Egon Elbre
4 min readApr 3, 2019

--

I’ve been optimizing Go code for a while and trying to improve my benchmarking game.

Let’s take a look at a simple example:

func BenchmarkReport(b *testing.B) {
runtime.GC()
for i := 0; i < b.N; i++ {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}

Running go test -bench . will give us the result:

BenchmarkReport-32      20000000               107 ns/op

This might be enough for getting a rough estimate where you stand in terms of performance, but a more detailed output is required for optimization. Condensing everything into a single number is bound to be simplistic.

github.com/loov/hrtime

Let me introduce you to the hrtime package which I wrote to get a more extensive output from benchmarks.

Histogram

The first approach to using it is to use hrtime.NewBenchmark. Rewriting the first example as a regular program.

func main() {
bench := hrtime.NewBenchmark(20000000)
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
fmt.Println(bench.Histogram(10))
}

Which will print:

avg 372ns;  min 300ns;  p50 400ns;  max 295µs;
p90 400ns; p99 500ns; p999 1.8µs; p9999 4.3µs;
300ns [ 7332554] ███████████████████████
400ns [12535735] ████████████████████████████████████████
600ns [ 18955]
800ns [ 2322]
1µs [ 20413]
1.2µs [ 34854]
1.4µs [ 25096]
1.6µs [ 10009]
1.8µs [ 4688]
2µs+[ 15374]

As we can see that P99 is 500ns, which means that 1% of all our measurements are above 500ns. We can try to optimize this by allocating fewer strings:

func main() {
bench := hrtime.NewBenchmark(20000000)
var back [1024]byte
for bench.Next() {
buffer := back[:0]
buffer = append(buffer, []byte("hello, world ")...)
buffer = strconv.AppendInt(buffer, 123, 10)
runtime.KeepAlive(buffer)
}
fmt.Println(bench.Histogram(10))
}

And the result:

avg 267ns;  min 200ns;  p50 300ns;  max 216µs;
p90 300ns; p99 300ns; p999 1.1µs; p9999 3.6µs;
200ns [ 7211285] ██████████████████████▌
300ns [12658260] ████████████████████████████████████████
400ns [ 81076]
500ns [ 3226]
600ns [ 343]
700ns [ 136]
800ns [ 729]
900ns [ 8108]
1µs [ 15436]
1.1µs+[ 21401]

We can now see that the 99% has gone from 500ns to 300ns.

If you have a keen eye, you may have noticed that Go benchmark gave an average 107ns/op however hrtime gave us 372ns/op. This is the unfortunate side-effect in trying to measure more — it always has an overhead. The final results include this overhead.

Note: depending on the operating-system the overhead can be significantly less and hrtime does support repeated calls inside with https://godoc.org/github.com/loov/hrtime#Histogram.Divide.

Stopwatch

Sometimes you also want to measure concurrent operations. For this there is Stopwatch . https://godoc.org/github.com/loov/hrtime#Stopwatch

Let’s say you want to measure how long does a send take on a highly contended channel. This, of course, is a contrived example, but roughly shows the idea of how we can start the measurement from one goroutine, stop it in another and finally print it all out.

func main() {
const numberOfExperiments = 1000
bench := hrtime.NewStopwatch(numberOfExperiments)
ch := make(chan int32, 10)
wait := make(chan struct{})
// start senders
for i := 0; i < numberOfExperiments; i++ {
go func() {
<-wait
ch <- bench.Start()
}()
}
// start one receiver
go func() {
for lap := range ch {
bench.Stop(lap)
}
}()
// wait for all goroutines to be created
time.Sleep(time.Second)
// release all goroutines at the same time
close(wait)
// wait for all measurements to be completed
bench.Wait()
fmt.Println(bench.Histogram(10))
}

hrtesting

Of course, writing a separate binary for all of the tests isn’t that convenient. For that there’s github.com/loov/hrtime/hrtesting, which provides wrappers for testing.B.

func BenchmarkReport(b *testing.B) {
bench := hrtesting.NewBenchmark(b)
defer bench.Report()
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}

It will print out 50%, 90%, and 99% percentiles.

BenchmarkReport-32               3000000               427 ns/op
--- BENCH: BenchmarkReport-32
benchmark_old.go:11: 24.5µs₅₀ 24.5µs₉₀ 24.5µs₉₉ N=1
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 12.8µs₉₉ N=100
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 500ns₉₉ N=10000
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 600ns₉₉ N=1000000
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 500ns₉₉ N=3000000

Unfortunately, with Go 1.12 it will print all of the runs of Benchmark instead of just the last. However, with Go 1.13, the output will be much nicer:

BenchmarkReport-32   3174566  379 ns/op  400 ns/p50  400 ns/p90 ...

And comparing results with benchstat will work as well.

hrplot

Leaving the best for last github.com/loov/hrtime/hrplot. Using my experimental plotting package, I decided to add a convenient way to plot the benchmark results.

func BenchmarkReport(b *testing.B) {
bench := hrtesting.NewBenchmark(b)
defer bench.Report()
defer hrplot.All("all.svg", bench)

runtime.GC()
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}

Will create an SVG file called all.svg. It contains a line plot, which shows how much time each iteration took. Second is a density plot, which shows the distribution of timing measurements. And finally a percentiles plot.

Conclusion

Performance optimization is fun, but having tools to help you out makes it even more enjoyable.

Go try out github.com/loov/hrtime and let me know what you think.

--

--