Benchmarking Is Hard — JMH Helps

Michael Hunger
97 Things
Published in
3 min readSep 9, 2019

Benchmarking on the JVM, especially microbenchmarking, is hard. It’s not enough to throw a nanosecond measurement around a call or loop and be done. You have to take into account warm-up, HotSpot compilation, code optimizations like inlining and dead-code elimination, multithreading, consistency of measurement, and more.

Fortunately, Aleksey Shipilëv the author of many great JVM tools, contributed JMH, the Java Microbenchmarking Harness to the OpenJDK. It consists of a small library and a build system plugin. The library provides annotations and utilities to declare your benchmarks as annotated Java classes and methods, including a BlackHole class to consume generated values to avoid code elimination. The library also offers correct state handling in the presence of multithreading.

The build system plugin generates a JAR with the relevant infrastructure code for running and measuring the tests correctly. That includes dedicated warm-up phases, proper multithreading, running multiple forks and averaging across them, and much more.

The tool also outputs important advice on how to use the gathered data and limitations thereof. Here is an example for measuring the impact of pre-sizing collections:

public class MyBenchmark {
static final int COUNT = 10000;
@Benchmark
public List<Boolean> testFillEmptyList() {
List<Boolean> list = new ArrayList<>();
for (int i=0;i<COUNT;i++) {
list.add(Boolean.TRUE);
}
return list;
}
@Benchmark
public List<Boolean> testFillAllocatedList() {
List<Boolean> list = new ArrayList<>(COUNT);
for (int i=0;i<COUNT;i++) {
list.add(Boolean.TRUE);
}
return list;
}
}

To generate the project and run it, you can use the JMH Maven archetype:

mvn archetype:generate \
-DarchetypeGroupId=org.openjdk.jmh \
-DarchetypeArtifactId=jmh-java-benchmark-archetype \
-DinteractiveMode=false -DgroupId=com.example \
-DartifactId=coll-test -Dversion=1.0
cd coll-test# add com/example/MyBenchmark.javamvn clean installjava -jar target/benchmarks.jar -w 1 -r 1...
# JMH version: 1.21
...
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.example.MyBenchmark.testFillEmptyList
...Result "com.example.MyBenchmark.testFillEmptyList":
30966.686 ±(99.9%) 2636.125 ops/s [Average]
(min, avg, max) = (18885.422, 30966.686, 35612.643), stdev = 3519.152
CI (99.9%): [28330.561, 33602.811] (assumes normal distribution)
# Run complete. Total time: 00:01:45REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
MyBenchmark.testFillAllocatedList thrpt 25 56786.708 ± 1609.633 ops/s
MyBenchmark.testFillEmptyList thrpt 25 30966.686 ± 2636.125 ops/s

So we see that our pre-allocated collection is almost twice as fast as the default instance because it doesn’t have to be resized during addition of elements.

JMH is a powerful tool in your toolbox to write correct microbenchmarks. If you run them in the same environment they are even comparable, which should be the main way of interpreting their results. They can also be used for profiling purposes, as they provide stable, repeatable results. Aleksey has much more to say about the topic if you’re interested.

--

--

Michael Hunger
97 Things

A software developer passionate about teaching and learning. Currently working with Neo4j, GraphQL, Kotlin, ML/AI, Micronaut, Spring, Kafka, and more.