Towards Jaeger v2 💥💥💥 Moar OpenTelemetry!

Published in

JaegerTracing

7 min readJul 25, 2024

Jaeger, the popular open-source distributed tracing system, is getting a major upgrade with the upcoming release of Jaeger v2. This new version is a new architecture for Jaeger backend components that utilizes OpenTelemetry Collector framework as the base and extends it with Jaeger’s unique features. It promises to bring significant improvements and changes, making Jaeger more flexible, extensible, and even better aligned with the OpenTelemetry project.

In this blog post, we’ll dive into the details of Jaeger v2, exploring its design, features, and benefits. We’ll also discuss the roadmap for development and what users can expect from this exciting new release.

Why OpenTelemetry Collector

Jaeger and OpenTelemetry Collector solve different problems. Jaeger is a complete tracing platform that includes storage and the UI. OpenTelemetry Collector is usually an intermediate component in the collection pipelines that is used to receive, process, transform, and export different telemetry types. The two systems have some overlaps, for example, jaeger-agent and jaeger-collector play roles similar to what can be done with OpenTelemetry Collector, but only for traces.

Historically, both Jaeger and OpenTelemetry Collector reused each other’s code. Collector supports receivers for legacy Jaeger formats implemented by importing Jaeger packages. And Jaeger reuses Collector’s OTLP receivers and OTLP-to-Jaeger data model converters. Because of this synergy, it’s been our goal for a while to bring the two projects closer.

OpenTelemetry Collector has a very flexible and extensible design, which makes it easy to extend with additional components needed for Jaeger use cases.

Third Time a Charm

This is actually our third attempt to utilize the OpenTelemetry Collector framework as a basis for Jaeger v2. In the first attempt we tried to keep the Jaeger v2 configuration compatible with v1, reusing the same CLI flags, which was ultimately a mistake as it was too difficult to maintain. In the second attempt we tried to use the OpenTelemetry Collector Builder (known as ocb) to compose Jaeger v2 binary in a different repository, which also proved to be difficult to maintain.

This third attempt is much further along due to several decisions (see the RFC doc for more details):

We decided to break CLI configuration compatibility and embrace the OpenTelemetry Collector file configuration mechanism.
We build Jaeger v2 binary by directly importing OpenTelemetry Collector code as a library, which makes the development much easier than with ocb (although we do plan to support ocb in the future as an extension mechanism).
We’ve implemented adapter layers that allow us to reuse the existing Jaeger v1 code directly in Jaeger v2, which means we can continue evolving a single code base and do the upgrades in-place, instead of working on an incompatible fork for many months.

Features and Benefits

By aligning Jaeger v2 architecture with the OpenTelemetry Collector, we can deliver several exciting features and benefits for users, including:

Native OpenTelemetry processing: Jaeger v2 will natively support the OTLP data format, eliminating the translation step from OTLP to Jaeger’s internal data format and improving performance.
Batched data processing: the OpenTelemetry Collector pipelines operate on batches of data, which can be especially important when sending data to storage backends like ClickHouse that are much more performant with batch inserts. Jaeger v2 will be able to utilize this batch-based pipeline design, in contrast to v1’s own internal pipeline which was designed around individual spans.
Familiar developer experience: Jaeger v2 will follow the same configuration and deployment model as the OpenTelemetry Collector, providing a more consistent developer experience.
Access to OpenTelemetry Collector features: Jaeger v2 will inherit all the core features of the Collector, including auth, cert reloading, internal monitoring, health checks, z-pages, etc.
Access to OpenTelemetry Collector ecosystem: Jaeger v2 will be able to use a multitude of extensions available for OpenTelemetry Collector, such as span-to-metric connector, tail-based sampling processor, telemetry rewriting processors, PII filtering, etc. For example, we are able to reuse Kafka exporter and receiver and replicate Jaeger v1's collector/ingester deployment model without maintaining any extra code.

The result of v2 is less code to maintain in the Jaeger project and an assured alignment with OpenTelemetry, which is already the standard way to instrument applications and collect telemetry.

The other major benefit to users is the ability to future-proof Jaeger as OpenTelemetry evolves, to ensure Jaeger is always the first tracing system for open source users. This is likely to improve collaboration and evolution of both projects.

Design and Architecture

Overall, Jaeger v2 architecture is very similar to a standard OpenTelemetry Collector that has pipelines for receiving and processing telemetry (a pipeline encapsulates receivers, processors, and exporters), and extensions that perform functions not directly related to processing of telemetry. Jaeger v2 makes a few specific design decisions in how to use the Collector framework.

High level architecture of Jaeger v2 binary

Query Extension 🔎

Querying for traces and presenting them in the UI is an example of functionality “not directly related to telemetry processing”, so naturally the equivalent of v1 jaeger-query is implemented as a Collector extension in Jaeger v2.

Single Binary

Jaeger v1 provided multiple binaries for different purposes (agent, collector, ingester, query). Those binaries were hardwired to perform different functions and exposed different configuration options. We realized that all that complexity was unnecessary in v2 architecture because we can achieve the same simply by enabling different components in the configuration file. We also did some benchmarking of executable size and noticed that if we bundle all possible Jaeger v2 components in a single binary, including ~3Mb (compressed) of UI assets, we end up with all binaries being around 50Mb in size, so separating them does not bring any real benefits. As a result, Jaeger v2 will ship as just a single binary jaeger, and it will be configurable for different deployment roles via YAML configuration file, the same as the OpenTelemetry Collector.

Storage Extension

One significant difference between Jaeger and OpenTelemetry Collector is that the Collector is designed for one-directional data processing (receivers → processors → exporters), which we call the write path. When the write path needs to store data in a database, the traditional approach in the OpenTelemetry Collector is to implement different exporters for different backends, such as Elasticsearch Exporter. In contrast, Jaeger supports both the write path and the read path, via query and UI. When we combine those functions in a single binary, like the equivalent of Jaeger v1 all-in-one, the write and read paths must share the storage backend implementation, so exporter-per-storage approach does not work for us.

To accommodate that, and to support Jaeger v1 existing capability of utilizing different storage backends, we abstracted the notion of a “storage” into Jaeger Storage Extension. The Jaeger Query Extension locates that Jaeger Storage Extension on start-up and asks it for a TraceReader. For the write path, we implemented a generic Jaeger Storage Exporter, which also consults the Storage Extension to obtain a TraceWriter implementation.

Because Query and Exporter ask for storage by name, we are able to simultaneously support different storage implementations for different purposes, for example one could configure Elasticsearch as the main trace storage, Cassandra as the archive storage, and something else as sampling strategies storage. In Jaeger v1 all storage roles had to be done in the same backend.

Roadmap and Development

The development of Jaeger v2 is ongoing, with several milestones planned before its general availability (GA). The alpha version is already available and supports most of Jaeger v1 functions, such as ingestion of different formats, support for the same storage backends as v1, query/UI, and all-in-one deployment. The team is working on adding remaining feature parity, improving performance, and enhancing the overall user experience.

The roadmap for Jaeger v2 includes the following milestones:

✅ Proof of concept: A single binary with memory storage and all-in-one functionality.
✅ Storage integration: support for the same storage backends as v1.
✅ New, more comprehensive end-to-end integration tests for all storage backends.
🚧 Feature parity: Kafka integration.
🚧 Feature parity: Service Performance Monitoring (SPM).
📅 Prepare for Beta: release pipeline, and documentation.
📅 Prepare for GA: creating a Helm chart, Kubernetes operator, and clarifying version compatibility guarantees.

Once the GA is announced, there are a few more milestones to continue improving Jaeger v2:

🚀 Upgrading UI to use OpenTelemetry data natively.
🚀 Upgrading storage backend implementations to Storage v2 interface to use OpenTelemetry data natively.
🚀 Support ClickHouse is official storage backend.

▶▶▶️ You can try out Jaeger v2 today! We publish a Docker image (https://hub.docker.com/r/jaegertracing/jaeger) and we provide a collection of configuration file templates for different deployment modes of Jaeger.

Leveraging Mentorship Programs 🎓

The Jaeger v2 roadmap was designed to minimize the amount of changes we need to make to the project, by avoiding big-bang approach in favor of incremental improvements to the existing code base. Yet it was still a significant amount of new development, which is often difficult to sustain for a volunteer-driven project. We were able to attract new contributors and drive the Jaeger v2 roadmap by participating in the mentorship programs run by Linux Foundation and Google, such as LFX Mentorship and Google Summer of Code. This has been a rewarding and mutually beneficial engagement for both the project and the participating interns. We are currently proposing two projects for the Fall LFX term.

Conclusion

Jaeger v2 represents a significant step forward for the Jaeger project, bringing improved flexibility, extensibility, and alignment with the OpenTelemetry project. With its native OTLP ingestion, simplified deployment model, and access to OpenTelemetry Collector features, Jaeger v2 promises to provide a more efficient and scalable distributed tracing solution.

As the development of Jaeger v2 continues, we can expect to see a more robust and feature-rich system emerge. Stay tuned for updates and get ready to experience the next generation of distributed tracing with Jaeger v2! 💥