What is the “cost” of doing instrumentation?
For the past few months, I’ve been working on a set of benchmarks for OpenTracing as part of my Outreachy internship. Performance tests are tricky, but I believe that most tests are representing the usage of the libraries in the real world. The results I’ll be presenting here came from performance tests still in progress. The code is available in OpenTracing GitHub, and I’m open to hearing your suggestions on how to improve it.
Doing instrumentation allows us to understand the behavior of our applications, detect abnormalities, and proactively take actions before the final user of our applications gets affected.
How we can achieve that?
In order to have the data of our transactions, we have to add traces to our application, by writing code, or using specific libraries in a “black box” way. If you need to familiarize with these concepts I recommend this article.
In a joint effort of some vendors, and with the support of CNCF, it was defined a standard specification, Opentracing, that allows writing our traces in a vendor-neutral way. This means that I can add instrumentation to my application according to the standardized API, and now choose a specific tracer, but if in the future I decided to change for another tracer implementation for some reason I don’t have to rewrite the code.
This post presents the benchmarks results of OpenTracing API for Java to understand how much is the performance overhead using instrumentation.
OpenTracing API for Java is the base for all instrumentation libraries which use it under the hood.
How the tests are built?
We used Java Microbenchmark Harness (JMH) to build the tests.
JMH is a toolkit created by OpenJDK. To build your first test, you can use an archetype provided by JMH:
mvn archetype:generate \
-DinteractiveMode=false \
-DarchetypeGroupId=org.openjdk.jmh \
-DarchetypeArtifactId=jmh-java-benchmark-archetype \
-DgroupId=org.sample \
-DartifactId=jmh-examples \
-Dversion=1.0
This command creates a base project that includes the required libraries and building plugins required for JMH.
These are fragments of an example of a benchmark application, each test we want to perform have to be annotated with @Benchmark and we can annotate additionally indicating which mode we want to use with @BenchmarkMode. In this example, we have a benchmark method by each tracer implementation we are trying to measure.
Sometimes you may want to initialize some variables that your benchmark code needs, but which you don’t want to be part of the code your benchmark measures. Such variables are called “state” variables. In this example, in the StateVariables the spring context is initialized and we set the variable that contains the controller we need to use in each test. Also in each iteration, we shut down the spring context.
In order to inject the right tracer for each test, we use spring profiles, that allow us to do that putting system properties before initialized the application.
With this project configuration (Main.java) the results are written in the directory results using JSON format.
To visualize graphically the results, we use JMH Visualizer.
Type of tests
For each scenario, we’ve built two types of tests: measuring the Throughput and the Sample Time.
SampleTime metrics
- X-axis: represents each execution result.
- Y-axis: represents how long time it takes for the benchmark method to execute.
Throughput metrics
- X-axis: represents each execution result.
- Y-axis: represents the number of operations per second (the number of times per second the benchmark method could be executed).
Results
Simple java
These tests measure the cost of a simple string concatenation using two constant strings “Hello” and “world” with an iteration number i. The instrumentation code is added manually. The tests compare a not-instrumented scenario with tests using instrumentation with the different tracers create a span, set a tag with the name of the tracers and log the resultant message of the concatenation.
Let’s analyze the numbers: the values for No-instrumentation and NoopTracer are very close and equals in some cases. MockTracer is not a good comparison, because it’s not intended to be used in a productive environment, keeping the values of span in RAM and these affect the results. So, for this and the other scenarios let’s compare the No-instrumentation with Jaeger tracer.
| No-Inst | Jaeger | % Jaeger | Sample Time degradation |
|----------|----------|----------|-------------------------|
| 1.14E-07 | 6.29E-07 | 554% | 454% |
| 1.27E-07 | 7.27E-07 | 573% | 473% |
| 1.33E-07 | 7.01E-07 | 526% | 426% |
| 1.29E-07 | 7.18E-07 | 558% | 458% |
| 1.51E-07 | 7.56E-07 | 500% | 400% |
According to these numbers, we can see that Sample Time increases ~440% in a simple java scenario.
| No-Inst | Jaeger | % Jaeger | Throughput degradation |
|------------|-----------|----------|------------------------|
| 17,491,646 | 1,820,168 | 10.41% | 89.59% |
| 17,330,040 | 1,784,527 | 10.30% | 89.70% |
| 17,185,759 | 1,732,435 | 10.08% | 89.92% |
| 17,350,257 | 1,741,779 | 10.04% | 89.96% |
| 17,222,531 | 1,695,528 | 9.845% | 90.16% |
We can see the Throughput decreases by around 90% in a simple java scenario.
Spring Boot
These tests use an example spring boot application containing a basic billing process. The instrumentation code is added manually in each Traced* spring service. The application is initialized for each test iteration, and using profiles, the right tracer is injected. The different tests measure the process of issuing an invoice in a not-instrumented scenario and instrumented with different tracers. The tests are performing the operation calling to the Spring services directly.
| No-Inst | Jaeger | % Jaeger | Sample Time degradation |
|-------------|-------------|----------|-------------------------|
| 1.16836E-06 | 3.47787E-06 | 298% | 198% |
| 1.16306E-06 | 3.60596E-06 | 310% | 210% |
| 1.18259E-06 | 3.51904E-06 | 298% | 198% |
| 1.17731E-06 | 3.5273E-06 | 300% | 200% |
| 1.21728E-06 | 3.544E-06 | 291% | 191% |
The Sample Time increases ~200% in a Spring Boot scenario.
| No-Inst | Jaeger | % Jaeger | Throughput degradation |
|---------|---------|----------|------------------------|
| 954,744 | 311,765 | 32.65% | 67.35% |
| 927,196 | 300,623 | 32.42% | 67.58% |
| 931,335 | 307,140 | 32.98% | 67.02% |
| 951,730 | 301,834 | 31.71% | 68.29% |
| 919,047 | 303,105 | 32.98% | 67.02% |
The Throughput decreases ~67% in Spring Boot scenario.
Spring Cloud
These tests use the petclinic sample spring-based application. The instrumentation is made automatically using the library opentracing-spring-cloud-starter. The application is initialized for each test iteration and using profiles, the right tracer is injected in. The different tests measure the process of finding a pet owner by id in a not-instrumented scenario and instrumented with different tracers. The tests are performing the operation calling to the Spring services directly.
| No-Inst | Jaeger | % Jaeger | Sample Time degradation |
|-------------|-------------|----------|-------------------------|
| 0.000139312 | 0.000150995 | 108% | 8% |
| 0.00013936 | 0.000148031 | 106% | 6% |
| 0.000142766 | 0.000146859 | 103% | 3% |
| 0.000146021 | 0.000150718 | 103% | 3% |
| 0.000146468 | 0.000148991 | 102% | 2% |
The Sample Time increases ~4% in a Spring Cloud scenario.
| No-Inst | Jaeger | % Jaeger | Throughput degradation |
|---------|--------|----------|------------------------|
| 7,160 | 6,691 | 93.45% | 6.55% |
| 6,990 | 6,523 | 93.32% | 6.68% |
| 7,012 | 6,794 | 96.89% | 3.11% |
| 7,112 | 6,235 | 87.67% | 12.33% |
| 7,097 | 6,257 | 88.16% | 11.84% |
The Throughput decreases ~8% in Spring Boot scenario.
JDBC
These tests use a sample spring boot application containing a basic course management process. The instrumentation is made automatically using the library opentracing-jdbc. The application is initialized for each test iteration, and using profiles, the right tracer is injected. The different tests measure the process of getting the list of all courses in a not-instrumented scenario and instrumented with different tracers. The tests are performing the operation calling to the Spring services directly.
| No-Inst | Jaeger | % Jaeger | Sample Time degradation |
|-------------|-------------|----------|-------------------------|
| 1.32359E-05 | 1.52948E-05 | 116% | 16% |
| 1.31049E-05 | 1.44857E-05 | 111% | 11% |
| 1.27248E-05 | 1.44507E-05 | 114% | 14% |
| 1.38321E-05 | 1.47132E-05 | 106% | 6% |
| 1.31739E-05 | 1.53496E-05 | 117% | 17% |
The Sample Time increases ~13% in a JDBC scenario.
| No-Inst | Jaeger | % Jaeger | Throughput degradation |
|---------|--------|----------|------------------------|
| 74,880 | 71,348 | 95.28% | 4.72% |
| 75,847 | 70,432 | 92.86% | 7.14% |
| 78,737 | 66,674 | 84.68% | 15.32% |
| 76,764 | 72,769 | 94.80% | 5.20% |
| 76,987 | 70,942 | 92.15% | 7.85% |
The Throughput decreases ~8% in JDBC scenario.
Servlet Filter
These tests use a servlet example application to process a GET request and return an HTML page with information of the request a random number assigned. Using Undertow Deployment API the servlet is deployed, and a new instance off Undertow server is launched. The instrumentation is made automatically using the library opentracing-web-servlet-filter. Using system properties, in each test, the tracer implementation is specified. The tracer is registered in GlobalTracer
on startup using an application listener. The tests consist of making a simple request to get the hello world page in a not-instrumented scenario and instrumented with different tracers.
| No-Inst | Jaeger | % Jaeger | Sample Time degradation |
|-------------|-------------|----------|-------------------------|
| 0.000145355 | 0.000153801 | 106% | 6% |
| 0.000152293 | 0.000147829 | 97% | -3% |
| 0.000145998 | 0.000147199 | 101% | 1% |
| 0.000151192 | 0.000145962 | 97% | -3% |
| 0.000147946 | 0.000149599 | 101% | 1% |
| No-Inst | Jaeger | % Jaeger | Throughput degradation |
|---------|--------|----------|------------------------|
| 6,741 | 6,822 | 101.20% | -1.20% |
| 6,827 | 6,811 | 99.77% | 0.23% |
| 6,824 | 6,912 | 101.29% | -1.29% |
| 6,747 | 6,757 | 100.15% | -0.15% |
| 6,876 | 6,641 | 96.58% | 3.42% |
The metrics show no evidence of overhead, as the deltas of Throughput and Sample Time are not representative (0% and 0.2%respectively).
JAX-RS
These tests use a sample jax-rs with spring boot application containing a basic course management process. The instrumentation is made automatically using the library opentracing-jaxrs2-discovery. This application exposes a JAX-RS endpoint for the CourseResource
.The application is initialized for each test iteration, and using profiles, the right tracer is injected. The tracer is registered in GlobalTracer
at startup using an application listener. The tests consist of client rest call to get the list of all courses in a not-instrumented scenario and instrumented with different tracers.
| No-Inst | Jaeger | % Jaeger | Sample Time degradation |
|-------------|-------------|----------|-------------------------|
| 0.000204294 | 0.000217782 | 107% | 7% |
| 0.000210961 | 0.000217579 | 103% | 3% |
| 0.000211709 | 0.000209426 | 99% | -1% |
| 0.000208181 | 0.000215236 | 103% | 3% |
| 0.0002043 | 0.000209104 | 102% | 2% |
| No-Inst | Jaeger | % Jaeger | Throughput degradation |
|---------|--------|----------|------------------------|
| 4,629 | 4,607 | 99.52% | 0.48% |
| 4,532 | 4,870 | 107.46% | -7.46% |
| 5,034 | 4,727 | 93.90% | 6.10% |
| 4,569 | 4,931 | 107.92% | -7.92% |
| 4,594 | 5,059 | 110.12% | -10.12% |
The metrics show no evidence of overhead, as the deltas of Throughput and Sample Time are not representative (3% and -3.7%respectively).
Conclusions
- In simple java scenarios, the Throughput decreases ~90% and the Sample Time increases ~440%. This case does not represent a real use case, but it provides a good baseline of the theoretical limits of the tools involved. The big overhead that we see here when using a real tracer is indeed expected and a sign that the benchmark is sane.
- In scenarios that include calls through a framework (Spring boot, Spring Cloud, JDBC), on average, the Throughput decreases 12% and the Sample Time Increases 14%.
- In the scenarios with client calls through HTTP (Servlet Filter, JAX-RS), the metrics show no evidence of overhead, as the deltas of Throughput and Sample Time are not representative.
- Based on these results, we don’t see that the use of instrumentation presents a considerable impact on distributed applications,
considering it proportional to the cost of frameworks and networking required. The overhead comes from the actual tracer and is closely related to how they work and how they are configured. The next logical step is to create benchmarks for concrete tracers, such as Jaeger’s, to understand how much of the overhead can be controlled via configuration options such as sampling.