Building an open source tracer

An interview with Dominik Honnef

Distributed tracing has been around since the 1970s but is only recently becoming important to the developer population. By illuminating the path a request takes through a system, tracing brings much needed visibility in a highly complex microservices world. Here we interview Dominik Honnef, an open source contributor blazing new trails in the field.

Dominik is a contributor to OpenTracing — a vendor-neutral, context-propagation API. OpenTracing draws from the deep domain experience of the likes of Ben Sigelman, Yuri Shkuro, Adrian Cole, and Dan Kuebrich contributing to it, all of whom are domain experts in distributed tracing. Drawing from the principles of Dapper, Google’s production distributed tracing system, OpenTracing provides an API to make it easier to instrument an application for tracing and then hook in a tracer such as Zipkin, Appdash, or LightStep. Dominik is building an open source tracer to go with it.

In this interview, Dominik highlights how there is not enough literature available to dig deeper on the subject. This is a lamentable state of affairs because tracing puts great power in the hands of developers. With a fully instrumented system, you know exactly what is going on in the application layer, where your performance bottlenecks are, and how you can debug effectively across the stack. In conjunction with the OpenTracing manifesto, this blog series hopes to shed light on the subject.

And without further ado, here’s our interview with Dominik.

OT: Can you tell us what is Tracer?

DH: Tracer is a distributed tracing system in the style of Dapper and Zipkin. It is written in Go and its aim is to be easy to install and run, without the need for a JVM or other complex software.

OT: Why are you building it? There already seem to be options available such as Zipkin.

DH: The primary goal was to have something that is easy to deploy, especially for people who are used to Go’s static binaries. Zipkin was thus eliminated for its reliance on the JVM. Appdash looked promising, but didn’t yet seem as production ready as I had hoped. And Lightstep seems to address a crowd that wants a full-fledged, hosted solution, whereas Tracer is open source and self-hosted.

OT: Why did you decide to make Tracer OpenTracing compatible?

DH: I learned about OpenTracing early in my research on distributed tracing. Furthermore a friend pointed me in its direction. After reading the specification, I decided that it matched with all of my goals for the API and it made more sense to support an emerging standard than to come up with my own API.

I have since started engaging in the OpenTracing project, providing my opinions on some aspects of the specification and planned changes and I find that it’s a very open project that is welcoming input from other people. This openness makes it very easy to work with the standard.

OT: What are some gotchas you wish you knew before you started?

DH: I think the main “gotcha” is the lack of technical content on distributed tracing. There is of course the Dapper paper, and 1–2 valuable talks on Zipkin, but overall there is very little information on implementation details or possible difficulties. This could make one believe that tracing is a topic that is vastly more difficult than it actually is. Coming up with a basic implementation takes days, not months, and there’s really no excuse for not having tracing be a core part of your architecture.

A “gotcha” with OpenTracing currently is that it is an evolving standard and not at all finalized yet. There will be upcoming changes that affect the API in a somewhat large way. To anyone writing tracing systems that implement the OpenTracing spec I would recommend actively engaging in the project, to provide feedback and to watch the landscape of tracing evolve.

OT: When will Tracer be ready and can others contribute?

DH: Tracer is currently in an alpha state. Its core components are there, and one could probably use it already in small scale systems. In fact, we’re really looking for people to test it. Testers should be okay with encountering and reporting bugs, with stress-testing it and reporting performance bottlenecks, and with asking for missing features.

Aside from testing the existing code, we would greatly appreciate contributions in form of additional storage backends and transports. The current version uses PostgreSQL and gRPC, but we can very much imagine further implementations that use message queues, Elasticsearch or Cassandra, and so on.

We want for Tracer to be easy to deploy and not depend on the JVM or similar runtimes, but at the same time we very much welcome a wide choice in backends, matching the different needs of small and large scale systems. PostgreSQL and gRPC will quickly get you started in small to medium size systems, but alternatives would be needed for systems of Netflix or Google scale. Our plan is to have a production ready release in the next 1–2 months and to incrementally improve it from there.

OT: How do you see the tracing landscape evolve in the next few years?

DH: I think OpenTracing will help make tracing more accessible to the open source community, and to smaller companies who might’ve been afraid of making tracing a core part of their infrastructure.

Google has long been benefiting from tracing, but they had the benefit of having all parts of their stack instrumented. Open source software, on the other hand, rarely is instrumented, largely because there was no common API. With OpenTracing, projects could rely on a shared API, without having to pick and support one specific tracing system.

I think that once the modern stack is fully instrumented we will be able to gain far deeper insights into performance bottlenecks and the flow of data.

Thanks for reading, feel free to recommend this post if you found it valuable. If you’d like to learn more about OpenTracing,