Jaeger and OpenTelemetry
Recently, OpenTelemetry has been announced as a new CNCF sandbox project resulting from a merger of OpenTracing and OpenCensus , , , . Several people have already asked me what OpenTelemetry means for the Jaeger project (incubating at CNCF), and whether it is going to replace Jaeger. I will attempt to answer these questions in this post.
TLDR; OpenTelemetry is great news for the Jaeger project!
I’ve been working on OpenTracing from its inception at a Zipkin workshop back in the Fall of 2015. We had just begun deploying distributed tracing at Uber, and I knew that we needed an open, vendor-neutral API to incorporate into the source code of Uber’s rapidly growing microservices ecosystem. Today, OpenTracing APIs exist in 9+ languages, enjoys wide support from vendors, and has over 100 integrations with various open source projects (https://opentracing.io/registry).
OpenCensus is an open-source reincarnation of Google’s internal Census libraries used for collecting tracing and metrics data. It took a different approach by providing a concrete, opinionated implementation for capturing observability signals. By employing “batteries included” approach it had an advantage over OpenTracing for software that was shipped as binaries, such as database engines, Kubernetes components, etc., because the binaries could link to a known implementation rather than use late binding to concrete tracers that was required by the OpenTracing approach. On the downside, since OpenCensus APIs were tightly coupled to the implementations, it was difficult and often impossible to bind the instrumentation to different implementations even when users wanted it.
Both projects were aiming to make observability easy for modern applications and expedite wide adoption of distributed tracing by the software industry, yet their competition with each other, which normally would be a very healthy thing, was achieving exactly the opposite. Being faced with two competing standards, the end users were stuck with the “VHS vs. Betamax” choice.
It turns out that the approaches of the two projects were complementary, rather than contradictory. There was no reason why we couldn’t have both the abstract, vendor-neutral API and a well-supported, default implementation. Enter OpenTelemetry!
The greatest promise of OpenTelemetry is not to solve some new problems that OpenTracing and OpenCensus did not solve. Instead, it is the promise of a single standard instead of competing two standards. Towards that goal the first GA versions of OpenTelemetry libraries are intentionally narrowly scoped to:
- Be backwards compatible with OpenTracing and OpenCensus instrumentation via shims.
- Avoid introducing new features not already present in the two original projects
The greatest promise of OpenTelemetry is a single standard for observability instead of two competing standards.
Context propagation as the underlying layer
I recently wrote an article about the importance of distributed context propagation for modern distributed systems. We all know that tracing cannot function without it, but it is not the only application that can benefit from context propagation. Many popular telemetry APIs (metrics and logging) are not context-aware, which makes some of the use cases describes in my article very difficult to support, especially in languages like Go where the access to the context must be explicit. To its credit, the OpenCensus project had always intended for the telemetry APIs to be context-aware, whereas in OpenTracing the general purpose context propagation (aka “baggage”) was built into the tracing API, which made it awkward to use from, say, metrics APIs. One of the great side effects of merging the two projects is the agreement to separate context propagation into an underlying API layer used by the other telemetry APIs to access contextual data. I wrote a design proposal “Context Propagation Layers” that goes into more details.
OpenTelemetry and Jaeger
Unlike some other tracing backends, the Jaeger project was never meant to solve the code instrumentation problem. By shipping tracer libraries compatible with OpenTracing, we were able to leverage the rich ecosystem of existing OpenTracing-compatible instrumentation, and focus our efforts on building the tracing backend, visualization tools, and data mining techniques.
This model still works with the emergence of OpenTelemetry, which is aimed at the instrumentation space. The end users will be able to instrument their applications or frameworks with OpenTelemetry SDK and use Jaeger as the backend for tracing data.
Then there is the question about the future of Jaeger tracers (client libraries), which do occupy the same problem space as OpenTelemetry. In the short term, Jaeger client libraries can be changed to implement the OpenTelemetry API. This may be necessary in order to support new style of instrumentation while keeping the existing functionality that is specific to Jaeger (such as adaptive sampling).
In the long term, we will seriously consider freezing development of Jaeger client libraries and porting their unique features to OpenTelemetry default implementations, either upstream or as plugins. Developing and maintaining client libraries in multiple languages is a significant investment of project resources, which would be better spent on building new backend features.
What about OpenCensus Agent/Collector?
The “battery included” approach did not always work well even for OpenCensus libraries, because they still needed to be configured with specific exporter plugin in order to send data to concrete tracing backends, like Jaeger or Zipkin. To address that issue, the OpenCensus project started development of two backend components called agent and collector, playing nearly identical roles to Jaeger’s agent and collector:
- agent is a sidecar / host agent that receives telemetry from the client library in a standardized format and forwards it to collector;
- collector translates the data into the format understood by a specific tracing backend and sends it there. OpenCensus Collector is also able to perform tail-based sampling.
These two components have a much larger overlap with the functionality of the respective Jaeger backend components. However, they are still limited to the problem space of data gathering, rather than trace storage or post-processing. It means that in the future we might also strongly consider deprecating Jaeger agent and collector components and instead deploying the respective OpenTelemetry components. The main open question is whether OpenTelemetry components will be able to support additional features provided by the Jaeger components, such as adaptive sampling.
As I stated at the beginning, the OpenTelemetry project is a good news for the Jaeger project, as they are very much complementary in terms of problem domains each is trying to address. In the areas where there is an overlap, namely client libraries, agent & collector, we are planning to collaborate with OpenTelemetry and ideally deprecate the respective Jaeger components so that we don’t have to waste time maintaining redundant software.
- OpenTracing and OpenCensus are merging (OpenTracing blog)
- A Brief History of OpenTelemetry (So Far) (CNCF blog)
- OpenTelemetry: Panel Discussion and Q&A (KubeCon EU video)
- OpenTelemetry: Backwards Compatibility with OpenTracing and OpenCensus (KubeCon EU video)