Distributed Tracing in 10 Minutes
With the intrinsic concurrency and asynchrony of modern software applications, distributed tracing has become part of the table stakes for effective monitoring. That said, instrumenting a system for tracing has, at least historically, been a labor-intensive, complicated task. Tracing brings the benefits of visibility into an application as it grows to 10+ processes, starts seeing increased concurrency, or non-trivial interactions between mobile/web clients and servers. But setting up instrumentation and deciding which tracer to use can add up to a large project. The OpenTracing standard changes that, making it possible to instrument applications for distributed tracing with minimal effort. As I demonstrate below, you can now easily set up tracing in less than 10 minutes with OpenTracing.
Imagine a simple website. Whenever a user goes to your home page, the web server makes two HTTP calls, and each of those calls branches out and makes a call to the database. This is fairly straightforward and debugging any slow requests wouldn’t be too difficult. If you are serious about latency, you might assign each request a unique ID and propagate it downstream through HTTP headers. If a request took a long time, you can then grep over the log files for that request ID to figure out what was going on. Now imagine your website starts becoming popular and your application is spread across multiple machines and services. As the number of machines and services grow, logs provide less and less visibility. Determining causality gets tricky pretty quickly. This is when you realize workflow tracing would be more than worth the investment.
As I mentioned, OpenTracing steps in to make it very easy for you to trace because it standardizes instrumentation. What that means is that you can instrument first and defer most implementation decisions to later.
You can follow my entire process below — from building the web app to seeing the traces in AppDash, the open source distributed tracing system I chose. Alternatively, you can skip ahead and see the finished result with Appdash. To do that, run
docker run --rm -ti -p 8080:8080 -p 8700:8700 bg451/opentracing-example
This will spin up the test server and a local Appdash instance. The source code can be found at here.
For those who want to see the full story, you can go through the full exercise of building the web app, instrumenting it with OpenTracing, binding to a tracer, AppDash, and finally seeing the traces, in this blog post.
Building the web app
To start off, write a few simple endpoints:
Put this all together into a working server.
Throw everything into main.go file and run go run main.go.
Instrument the app
Now that you have a working web server, you can start instrumenting it. Start at the top level and work your way down. You can start a span and finish it like so:
This span records how long it takes homeHandler to complete, but that’s just the tip of the iceberg in terms of information that you can record. OpenTracing enables you to attach tags and logs to an individual span. For instance, you can specify whether or not a span contains an error inside homeHandler:
You can record other things as well, including important events, the user ID, and the browser type.
However, that’s only for one function. To build true end-to-end traces, you’ll want to include spans for the client side of the HTTP calls. In our example, you need to start propagating span contexts downstream to the other endpoints now, and those endpoints need to be able to join traces. This is where the Inject/Extract part of the API comes into play. homeHandler creates a “root” span since it’s the first thing to get called. You will start there and work your way down.
What happens underneath is that the underlying implementation injects a span’s metadata about the current trace into the request’s headers to be read by anyone downstream. Go ahead and extract that data in serviceHandler.
And that’s it! If you repeat the steps above for things you want to trace, you should have a fully instrumented system fairly quickly. To decide what needs be traced, you should look at your requests’ critical paths.
Connect the tracer
One of the great things about OpenTracing is that once your system is instrumented, adding a tracer is really straightforward! In this example, you can see that I’ve used Appdash, an open source tracing system. There’s small chunk of code needed inside your main function to start the Appdash instance. However, you won’t need to touch any of your instrumentation code at all. In your main function, add:
This will create an embedded Appdash instance and serve traces locally.
Should you want to change your tracer implementation, it is a O(1) change because of OpenTracing. All you need to do is update your main function; the rest of your instrumentation stays the same. For example, if you decide to use Zipkin later on, this is all you would need to do in your main function:
Having made it thus far, you can see that instrumenting your code for tracing is much easier with OpenTracing. I recommend this as a best practice whenever starting out on an app. That’s because by setting up tracing even when your application is small, trace data can guide your development strategy as you grow. Having visibility into your processes as they start to mature and increase in complexity will help you build a sustainable product.