App instrumentation generally involves significant manual effort, with application code invoking logging/metrics/tracing SDKs when something interesting happens. This is useful, but not without its challenges. For one, it’s a lot of work. It also leads to a lot of code cruft. The most consequential challenge, however, is that it mostly results in an inconsistent treatment of observability data (e.g., free-form log messages, metrics data embedded in log messages, unconventional metric and dimension names). There’s little leverage, and it’s hard to do anything systematic with the data.
In this post we’ll explore a recent and developing offering for automated instrumentation: the OpenTelemetry for Java Instrumentation project. I’ll show you how to use it to add automated tracing to your microservice-based Java application. I’m using opentelemetry-java-instrumentation v0.7.0.
The sample app
For your convenience, I’ve created a sample Spring Boot app called otel-demo. It’s a toy travel application that doesn’t allow you to do anything other than look at a list of flights from Seattle to Las Vegas. But it turns out that this is good enough to show OpenTelemetry (OTel) auto-instrumentation in action, so may as well keep it simple.
Here’s the underlying app architecture:
(Note that the endpoints in the diagram are the endpoints on the Docker Compose network. See the sample app’s README.md for the host-level port mappings.)
Notice that the app itself doesn’t have any observability stuff at all. It doesn’t log anything (besides what Spring Boot itself logs), it doesn’t write metrics, no traces, nothing. I didn’t even activate Spring Boot Actuator. The UI gets flights from an API, which in turn gets flights from two flight providers.
Maybe you noticed that the microservice startup scripts reference Jaeger and some kind of OTel Java agent. If so, pretend you didn’t see that yet. We’ll get to it in the next section.
At this point, go ahead and fire up the app:
$ docker-compose up
And then point your browser at http://localhost:8080. You should see the amazing UI from the screenshot above (designed it myself). You may have to wait for all the containers to spin up, so wait a few seconds and retry if you get an error.
Adding automated tracing
OK, now we get to the startup scripts that I just mentioned.
To add automated tracing to the app, we attach an instrumentation agent to the JVM using the
-javaagent option, and set some system properties that tell the agent where to send trace spans. Here I’m sending the spans directly to Jaeger, a popular tracing system. But there are other possibilities too, such as sending to Zipkin (another popular tracing system) or else an OTLP Collector. See opentelemetry-java-instrumentation for more information.
I downloaded the agent JAR from the opentelemetry-java-instrumentation tags page. Again I’m using v0.7.0 in the sample app. You’ll probably want to use the latest and greatest.
Here are the
JAVA_OPTS for the UI microservice:
What’s happening behind the scenes? The agent is knowledgeable about a fairly wide range of popular Java libraries and frameworks. During classload, the agent adds tracing instrumentation to targeted locations via bytecode injection. The app is now endowed with automatic tracing.
(Note: I was hoping that the same thing would work for metrics, but not yet. The opentelemetry-java-instrumentation team is waiting for the OTel spec to stabilize with regards to metrics semantics before they pursue that.)
Let’s see the result
The Docker Compose includes a Jaeger container, which you can access at http://localhost:16686. I’d recommend doing that, but to save you the trouble, here are some screenshots too.
First, here’s a trace for an earlier version of the app. (I’ll explain why we’re looking at an earlier version in a minute.)
You can see that auto-instrumentation knew to instrument Spring Web MVC, the Spring RestTemplate calls (backed by Apache HttpClient if memory serves) to the REST APIs, and the Spring Data repositories. I didn’t have to provide any of that myself.
Because the startup scripts for each microservice specified the service name, Jaeger is able to distinguish them and render the associated spans in different colors.
We can drill down on any of the spans. Here’s an HTTP request:
Though you couldn’t easily see it in the screenshots above, it turns out that opentelemetry-java-instrumentation knows about Hibernate too (yay), so we can even see SQL queries:
A feature that I love about tracing systems (not just Jaeger, but Zipkin and Expedia’s Haystack too) is that they automatically build service dependency graphs based on span data. This makes it easier for human operators to diagnose incidents, and also opens the door for some extremely interesting algorithmic and graph-based machine learning possibilities. Anyway, here’s Jaeger’s automatically-generated graph visualization:
This is of course a tiny system, but automatic architecture extraction becomes at least potentially more useful as the number of collaborating services grows. (No doubt there’s room for improving the visualizations themselves: see for example the articles by Cindy Sridharan and William Louth. Over time I’d expect to see both improved UX and algorithmic/ML-based processing.)
Before I close out this section, take another look at the first trace screenshot I posted above. You’ll see that the API calls providers 1 and 2 in series, which extends the call duration. After seeing the trace, I changed the behavior to a fork/join behavior. You can see the impact on the trace (parallel calls) and the duration (37ms vs 54ms previously) here:
In this post we looked at using the OpenTelemetry for Java Instrumentation agent to automate one aspect of an observability effort, which in this case is tracing instrumentation. As the OTel spec matures we can expect to see similar OTel auto-instrumentation emerge for metrics as well. While none of this completely replaces manual instrumentation, it does serve to reduce effort, to reduce code cruft, and to harmonize observability data so we can enable more systematic approaches to processing observability data (e.g., metrics aggregation based on conventional dimensions).