Auto-instrumentation with OpenTelemetry

Willie Wheeler
Aug 25, 2020 · 6 min read
Photo by Ferd Brundick

App instrumentation generally involves significant manual effort, with application code invoking logging/metrics/tracing SDKs when something interesting happens. This is useful, but not without its challenges. For one, it’s a lot of work. It also leads to a lot of code cruft. The most consequential challenge, however, is that it mostly results in an inconsistent treatment of observability data (e.g., free-form log messages, metrics data embedded in log messages, unconventional metric and dimension names). There’s little leverage, and it’s hard to do anything systematic with the data.

While manual instrumentation isn’t going anywhere, we can automate more than we typically do. Prometheus, for instance, boasts an impressive arsenal of automated metrics exporters.

In this post we’ll explore a recent and developing offering for automated instrumentation: the OpenTelemetry for Java Instrumentation project. I’ll show you how to use it to add automated tracing to your microservice-based Java application. I’m using opentelemetry-java-instrumentation v0.7.0.

The sample app

For your convenience, I’ve created a sample Spring Boot app called otel-demo. It’s a toy travel application that doesn’t allow you to do anything other than look at a list of flights from Seattle to Las Vegas. But it turns out that this is good enough to show OpenTelemetry (OTel) auto-instrumentation in action, so may as well keep it simple.

The Travel Deals UI

Here’s the underlying app architecture:

Travel Deals application architecture

(Note that the endpoints in the diagram are the endpoints on the Docker Compose network. See the sample app’s README.md for the host-level port mappings.)

Notice that the app itself doesn’t have any observability stuff at all. It doesn’t log anything (besides what Spring Boot itself logs), it doesn’t write metrics, no traces, nothing. I didn’t even activate Spring Boot Actuator. The UI gets flights from an API, which in turn gets flights from two flight providers.

Maybe you noticed that the microservice startup scripts reference Jaeger and some kind of OTel Java agent. If so, pretend you didn’t see that yet. We’ll get to it in the next section.

At this point, go ahead and fire up the app:

$ docker-compose up

And then point your browser at http://localhost:8080. You should see the amazing UI from the screenshot above (designed it myself). You may have to wait for all the containers to spin up, so wait a few seconds and retry if you get an error.

Adding automated tracing

OK, now we get to the startup scripts that I just mentioned.

To add automated tracing to the app, we attach an instrumentation agent to the JVM using the -javaagent option, and set some system properties that tell the agent where to send trace spans. Here I’m sending the spans directly to Jaeger, a popular tracing system. But there are other possibilities too, such as sending to Zipkin (another popular tracing system) or else an OTLP Collector. See opentelemetry-java-instrumentation for more information.

I downloaded the agent JAR from the opentelemetry-java-instrumentation tags page. Again I’m using v0.7.0 in the sample app. You’ll probably want to use the latest and greatest.

Here are the JAVA_OPTS for the UI microservice:

JAVA_OPTS="${JAVA_OPTS} \
-Xms${JAVA_XMS} \
-Xmx${JAVA_XMX} \
-Dapplication.name=${APP_NAME} \
-Dapplication.home=${APP_HOME} \
-Dotel.exporter=jaeger \
-Dotel.jaeger.endpoint=jaeger:14250 \
-Dotel.jaeger.service.name=otel-ui \
-javaagent:${APP_HOME}/opentelemetry-javaagent-all.jar"

What’s happening behind the scenes? The agent is knowledgeable about a fairly wide range of popular Java libraries and frameworks. During classload, the agent adds tracing instrumentation to targeted locations via bytecode injection. The app is now endowed with automatic tracing.

(Note: I was hoping that the same thing would work for metrics, but not yet. The opentelemetry-java-instrumentation team is waiting for the OTel spec to stabilize with regards to metrics semantics before they pursue that.)

Let’s see the result

The Docker Compose includes a Jaeger container, which you can access at http://localhost:16686. I’d recommend doing that, but to save you the trouble, here are some screenshots too.

First, here’s a trace for an earlier version of the app. (I’ll explain why we’re looking at an earlier version in a minute.)

A trace for an earlier version of the app.

You can see that auto-instrumentation knew to instrument Spring Web MVC, the Spring RestTemplate calls (backed by Apache HttpClient if memory serves) to the REST APIs, and the Spring Data repositories. I didn’t have to provide any of that myself.

Because the startup scripts for each microservice specified the service name, Jaeger is able to distinguish them and render the associated spans in different colors.

We can drill down on any of the spans. Here’s an HTTP request:

Tags for an HTTP GET request span.

Though you couldn’t easily see it in the screenshots above, it turns out that opentelemetry-java-instrumentation knows about Hibernate too (yay), so we can even see SQL queries:

Tags for a SQL query span.

A feature that I love about tracing systems (not just Jaeger, but Zipkin and Expedia’s Haystack too) is that they automatically build service dependency graphs based on span data. This makes it easier for human operators to diagnose incidents, and also opens the door for some extremely interesting algorithmic and graph-based machine learning possibilities. Anyway, here’s Jaeger’s automatically-generated graph visualization:

An automatically-extracted service dependency graph.

This is of course a tiny system, but automatic architecture extraction becomes at least potentially more useful as the number of collaborating services grows. (No doubt there’s room for improving the visualizations themselves: see for example the articles by Cindy Sridharan and William Louth. Over time I’d expect to see both improved UX and algorithmic/ML-based processing.)

Before I close out this section, take another look at the first trace screenshot I posted above. You’ll see that the API calls providers 1 and 2 in series, which extends the call duration. After seeing the trace, I changed the behavior to a fork/join behavior. You can see the impact on the trace (parallel calls) and the duration (37ms vs 54ms previously) here:

The current version of the app, which parallelizes the calls to the providers.

Conclusion

In this post we looked at using the OpenTelemetry for Java Instrumentation agent to automate one aspect of an observability effort, which in this case is tracing instrumentation. As the OTel spec matures we can expect to see similar OTel auto-instrumentation emerge for metrics as well. While none of this completely replaces manual instrumentation, it does serve to reduce effort, to reduce code cruft, and to harmonize observability data so we can enable more systematic approaches to processing observability data (e.g., metrics aggregation based on conventional dimensions).

wwblog

Willie Wheeler’s personal blog

Willie Wheeler

Written by

Interested in applying machine learning and data science to problems in operations. For my stats course and tutorials, see https://learnstats.io.

wwblog

wwblog

Willie Wheeler’s personal blog. See https://learnstats.io for my statistics course and tutorials.

Willie Wheeler

Written by

Interested in applying machine learning and data science to problems in operations. For my stats course and tutorials, see https://learnstats.io.

wwblog

wwblog

Willie Wheeler’s personal blog. See https://learnstats.io for my statistics course and tutorials.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store