Benchmarking with details

Egon Elbre
Apr 3, 2019 · 4 min read

I’ve been optimizing Go code for a while and trying to improve my benchmarking game.

Let’s take a look at a simple example:

func BenchmarkReport(b *testing.B) {
runtime.GC()
for i := 0; i < b.N; i++ {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}

Running go test -bench . will give us the result:

BenchmarkReport-32      20000000               107 ns/op

This might be enough for getting a rough estimate where you stand in terms of performance, but a more detailed output is required for optimization. Condensing everything into a single number is bound to be simplistic.

github.com/loov/hrtime

Let me introduce you to the hrtime package which I wrote to get a more extensive output from benchmarks.

Histogram

The first approach to using it is to use hrtime.NewBenchmark. Rewriting the first example as a regular program.

func main() {
bench := hrtime.NewBenchmark(20000000)
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
fmt.Println(bench.Histogram(10))
}

Which will print:

avg 372ns;  min 300ns;  p50 400ns;  max 295µs;
p90 400ns; p99 500ns; p999 1.8µs; p9999 4.3µs;
300ns [ 7332554] ███████████████████████
400ns [12535735] ████████████████████████████████████████
600ns [ 18955]
800ns [ 2322]
1µs [ 20413]
1.2µs [ 34854]
1.4µs [ 25096]
1.6µs [ 10009]
1.8µs [ 4688]
2µs+[ 15374]

As we can see that P99 is 500ns, which means that 1% of all our measurements are above 500ns. We can try to optimize this by allocating fewer strings:

func main() {
bench := hrtime.NewBenchmark(20000000)
var back [1024]byte
for bench.Next() {
buffer := back[:0]
buffer = append(buffer, []byte("hello, world ")...)
buffer = strconv.AppendInt(buffer, 123, 10)
runtime.KeepAlive(buffer)
}
fmt.Println(bench.Histogram(10))
}

And the result:

avg 267ns;  min 200ns;  p50 300ns;  max 216µs;
p90 300ns; p99 300ns; p999 1.1µs; p9999 3.6µs;
200ns [ 7211285] ██████████████████████▌
300ns [12658260] ████████████████████████████████████████
400ns [ 81076]
500ns [ 3226]
600ns [ 343]
700ns [ 136]
800ns [ 729]
900ns [ 8108]
1µs [ 15436]
1.1µs+[ 21401]

We can now see that the 99% has gone from 500ns to 300ns.

If you have a keen eye, you may have noticed that Go benchmark gave an average 107ns/op however hrtime gave us 372ns/op. This is the unfortunate side-effect in trying to measure more — it always has an overhead. The final results include this overhead.

Note: depending on the operating-system the overhead can be significantly less and hrtime does support repeated calls inside with https://godoc.org/github.com/loov/hrtime#Histogram.Divide.

Stopwatch

Sometimes you also want to measure concurrent operations. For this there is Stopwatch . https://godoc.org/github.com/loov/hrtime#Stopwatch

Let’s say you want to measure how long does a send take on a highly contended channel. This, of course, is a contrived example, but roughly shows the idea of how we can start the measurement from one goroutine, stop it in another and finally print it all out.

func main() {
const numberOfExperiments = 1000
bench := hrtime.NewStopwatch(numberOfExperiments)
ch := make(chan int32, 10)
wait := make(chan struct{})
// start senders
for i := 0; i < numberOfExperiments; i++ {
go func() {
<-wait
ch <- bench.Start()
}()
}
// start one receiver
go func() {
for lap := range ch {
bench.Stop(lap)
}
}()
// wait for all goroutines to be created
time.Sleep(time.Second)
// release all goroutines at the same time
close(wait)
// wait for all measurements to be completed
bench.Wait()
fmt.Println(bench.Histogram(10))
}

hrtesting

Of course, writing a separate binary for all of the tests isn’t that convenient. For that there’s github.com/loov/hrtime/hrtesting, which provides wrappers for testing.B.

func BenchmarkReport(b *testing.B) {
bench := hrtesting.NewBenchmark(b)
defer bench.Report()
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}

It will print out 50%, 90%, and 99% percentiles.

BenchmarkReport-32               3000000               427 ns/op
--- BENCH: BenchmarkReport-32
benchmark_old.go:11: 24.5µs₅₀ 24.5µs₉₀ 24.5µs₉₉ N=1
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 12.8µs₉₉ N=100
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 500ns₉₉ N=10000
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 600ns₉₉ N=1000000
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 500ns₉₉ N=3000000

Unfortunately, with Go 1.12 it will print all of the runs of Benchmark instead of just the last. However, with Go 1.13, the output will be much nicer:

BenchmarkReport-32   3174566  379 ns/op  400 ns/p50  400 ns/p90 ...

And comparing results with benchstat will work as well.

hrplot

Leaving the best for last github.com/loov/hrtime/hrplot. Using my experimental plotting package, I decided to add a convenient way to plot the benchmark results.

func BenchmarkReport(b *testing.B) {
bench := hrtesting.NewBenchmark(b)
defer bench.Report()
defer hrplot.All("all.svg", bench)

runtime.GC()
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}

Will create an SVG file called all.svg. It contains a line plot, which shows how much time each iteration took. Second is a density plot, which shows the distribution of timing measurements. And finally a percentiles plot.

Image for post
Image for post

Conclusion

Performance optimization is fun, but having tools to help you out makes it even more enjoyable.

Go try out github.com/loov/hrtime and let me know what you think.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade