Benchmarking with details
I’ve been optimizing Go code for a while and trying to improve my benchmarking game.
Let’s take a look at a simple example:
func BenchmarkReport(b *testing.B) {
runtime.GC()
for i := 0; i < b.N; i++ {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}
Running go test -bench .
will give us the result:
BenchmarkReport-32 20000000 107 ns/op
This might be enough for getting a rough estimate where you stand in terms of performance, but a more detailed output is required for optimization. Condensing everything into a single number is bound to be simplistic.
Let me introduce you to the hrtime package which I wrote to get a more extensive output from benchmarks.
Histogram
The first approach to using it is to use hrtime.NewBenchmark
. Rewriting the first example as a regular program.
func main() {
bench := hrtime.NewBenchmark(20000000)
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
fmt.Println(bench.Histogram(10))
}
Which will print:
avg 372ns; min 300ns; p50 400ns; max 295µs;
p90 400ns; p99 500ns; p999 1.8µs; p9999 4.3µs;
300ns [ 7332554] ███████████████████████
400ns [12535735] ████████████████████████████████████████
600ns [ 18955]
800ns [ 2322]
1µs [ 20413]
1.2µs [ 34854]
1.4µs [ 25096]
1.6µs [ 10009]
1.8µs [ 4688]
2µs+[ 15374]
As we can see that P99 is 500ns, which means that 1% of all our measurements are above 500ns. We can try to optimize this by allocating fewer strings:
func main() {
bench := hrtime.NewBenchmark(20000000)
var back [1024]byte
for bench.Next() {
buffer := back[:0]
buffer = append(buffer, []byte("hello, world ")...)
buffer = strconv.AppendInt(buffer, 123, 10)
runtime.KeepAlive(buffer)
}
fmt.Println(bench.Histogram(10))
}
And the result:
avg 267ns; min 200ns; p50 300ns; max 216µs;
p90 300ns; p99 300ns; p999 1.1µs; p9999 3.6µs;
200ns [ 7211285] ██████████████████████▌
300ns [12658260] ████████████████████████████████████████
400ns [ 81076]
500ns [ 3226]
600ns [ 343]
700ns [ 136]
800ns [ 729]
900ns [ 8108]
1µs [ 15436]
1.1µs+[ 21401]
We can now see that the 99% has gone from 500ns to 300ns.
If you have a keen eye, you may have noticed that Go benchmark gave an average 107ns/op
however hrtime
gave us 372ns/op
. This is the unfortunate side-effect in trying to measure more — it always has an overhead. The final results include this overhead.
Note: depending on the operating-system the overhead can be significantly less and hrtime
does support repeated calls inside with https://godoc.org/github.com/loov/hrtime#Histogram.Divide.
Stopwatch
Sometimes you also want to measure concurrent operations. For this there is Stopwatch
. https://godoc.org/github.com/loov/hrtime#Stopwatch
Let’s say you want to measure how long does a send take on a highly contended channel. This, of course, is a contrived example, but roughly shows the idea of how we can start the measurement from one goroutine, stop it in another and finally print it all out.
func main() {
const numberOfExperiments = 1000
bench := hrtime.NewStopwatch(numberOfExperiments)ch := make(chan int32, 10)
wait := make(chan struct{})// start senders
for i := 0; i < numberOfExperiments; i++ {
go func() {
<-wait
ch <- bench.Start()
}()
}// start one receiver
go func() {
for lap := range ch {
bench.Stop(lap)
}
}()// wait for all goroutines to be created
time.Sleep(time.Second)
// release all goroutines at the same time
close(wait)// wait for all measurements to be completed
bench.Wait()
fmt.Println(bench.Histogram(10))
}
hrtesting
Of course, writing a separate binary for all of the tests isn’t that convenient. For that there’s github.com/loov/hrtime/hrtesting
, which provides wrappers for testing.B
.
func BenchmarkReport(b *testing.B) {
bench := hrtesting.NewBenchmark(b)
defer bench.Report()for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}
It will print out 50%, 90%, and 99% percentiles.
BenchmarkReport-32 3000000 427 ns/op
--- BENCH: BenchmarkReport-32
benchmark_old.go:11: 24.5µs₅₀ 24.5µs₉₀ 24.5µs₉₉ N=1
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 12.8µs₉₉ N=100
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 500ns₉₉ N=10000
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 600ns₉₉ N=1000000
benchmark_old.go:11: 400ns₅₀ 500ns₉₀ 500ns₉₉ N=3000000
Unfortunately, with Go 1.12 it will print all of the runs of Benchmark instead of just the last. However, with Go 1.13, the output will be much nicer:
BenchmarkReport-32 3174566 379 ns/op 400 ns/p50 400 ns/p90 ...
And comparing results with benchstat
will work as well.
hrplot
Leaving the best for last github.com/loov/hrtime/hrplot
. Using my experimental plotting package, I decided to add a convenient way to plot the benchmark results.
func BenchmarkReport(b *testing.B) {
bench := hrtesting.NewBenchmark(b)
defer bench.Report()
defer hrplot.All("all.svg", bench)
runtime.GC()
for bench.Next() {
r := fmt.Sprintf("hello, world %d", 123)
runtime.KeepAlive(r)
}
}
Will create an SVG file called all.svg
. It contains a line plot, which shows how much time each iteration took. Second is a density plot, which shows the distribution of timing measurements. And finally a percentiles plot.
Conclusion
Performance optimization is fun, but having tools to help you out makes it even more enjoyable.
Go try out github.com/loov/hrtime and let me know what you think.