The life of a span

Juraci Paixão Kröhling
JaegerTracing
Published in
6 min readJul 18, 2018

In the OpenTracing realm, as well as in other parts of the distributed tracing world, a “span” is the name given to the data structure that stores data related to a single unit of work. In most cases, developers just worry about instantiating a tracer and letting the instrumentation libraries capture interesting spans. How they actually reach a backend like Jaeger’s can be somewhat of a mystery.

Let’s try to clear out some of this magic.

For this article, let’s focus on what happens when we assume the defaults for all components involved. So you’ll have to remember that what actually happens in the background in your own implementation might differ from what we describe here, depending on your configuration.

We’ll use a sample application with the Jaeger Java Client, but other Jaeger client libraries for other languages act in a very similar manner.

The instrumented application’s setup

The application used in this blog post is very simple and won’t register the tracer with the GlobalTracer as would be usual. Instead, it just brings an Eclipse Vert.x verticle up and creates a simple span for each HTTP request our handler receives. The code repository with this example is available on GitLab, but here’s the most relevant part:

The Eclipse Vert.x verticle that generates the span

During the bootstrap, the Jaeger Java Client will build an instance of RemoteReporterbehind the scenes, which starts a daemon thread and is responsible for flushing the spans stored in the buffer (see JAEGER_REPORTER_FLUSH_INTERVAL).

This reporter will build an instance of the UdpSender, which just sends the captured span using Thrift via UDP to a Jaeger Agent running on localhost. Depending on the Tracer’s configuration, an HttpSender could have been used instead.

Span genesis

Once an instrumentation library or the “business” code starts a span, the Jaeger Java Client will use the JaegerTracer.SpanBuilderto generate an instance of JaegerS. This instance includes a reference to a context object ( JaegerSpanContext), including a TraceIDand SpanID. Both hold the same value for our span, as it’s the root of the tree, also known as the “parent span”.

Span reporting

Our instrumented code starts a span and does the required processing, like adding a specific HTTP header and writing the response to the client. Once that is done, the try-with-resourcesstatement will automatically call the close()method, which ends up calling JaegerSpan#finish(). The span is then delivered by the JaegerTracerto the RemoteReporter. At this point, this is what our span looks like:

JaegerSpan as seen by the RemoteReporter#reportSpan(JaegerSpan) method

The RemoteReporter will simply add the span to a queue and deliver the control back to the caller, so that no IO-blocking will ever occur that might negatively impact the actual application being traced. Needless to say, no more work happens in the “main” thread for this span.

Flush!

As soon as the span is in the queue,UdpSender#append(JaegerSpan) is called by the background thread. The sender will convert everyJaegerSpaninto a Thrift span before sending them over the wire to the agent. After the conversion, this is how the span looks like:

Our span as seen by the Thrift Sender

This span in Thrift format is added to a buffer, whose size is constantly tracked by the sender. Once the buffer approaches the maximum size of a UDP packet (about 65 KB) or some time has elapsed, the UdpSender flushes the list of spans, along with a Process object, representing the tracer process to UdpSender#send(Process, List<JaegerSpan>). This is the trigger for the UdpSender to emit a Thrift batch to the agent. For the curious ones out there, here’s how the batch looks like over the wire:

Compact span batch sent from Jaeger Java Client to Jaeger Agent

Quick appearance at the Agent

The Jaeger Agent is the daemon that runs very close to the instrumented application. Its sole purpose is to catch spans submitted from instrumented applications via UDP and relay them via a long-lived TChannel connection to the collector.

A batch, as received by the agent, contains two main properties: a Process, representing the metadata about the process where the Jaeger Tracer was running on the client, and a list of Spans. As the process metadata is the same for all spans in the same batch, it would potentially save some resources for batches with several spans. In our case, we have only one span and this is how it looks right before the agent dispatches it to the collector:

A batch submitted by the UdpSender containing only one span

Reaching the Collector

After its quick appearance at the Agent, our span reaches the Jaeger Collector via the TChannel handler at SpanHandler#SubmitBatches, responsible for dealing with batches in Jaeger format. Other formats, such as Zipkin, would have different handlers.

Our batch will then be submitted to the collector and the pretty version of the payload would look like the following:

How the collector sees the incoming batch

The span handler will then build individual spans in Jaeger format, each one including a copy of the process object, and deliver the resulting list to a SpanProcessor#ProcessSpans.Our span has a different format now:

At this stage, a span might go through a pre-processing routine and/or might be filtered out. Under normal conditions, though, spans will then reach SpanProcessor#saveSpan . If we had more spans in the batch, we’d see this method being called once for every span. A “Span Writer” will be employed, which can be a Cassandra, Elasticsearch, Kafka or in-memory span writer.

How it looks like in the concrete storage is left as an exercise to the reader, but a peek into the schema file for Cassandra or Elasticsearch might reveal quite a lot of details.

It’s worth noting that, from this point and on, we stop referring to our span as “span”: for almost all cases after this, it is treated as a fully fledged trace, that happens to be composed by a single span.

The UI and the query

At this point, our trace is in storage, ready to be retrieved by the UI via the query component.

Under the Jaeger UI, traces can be retrieved based on search terms such as the service name, “vertx-create-span” in our case. On the backend side, the Jaeger Query component will see the following as the search terms when we select the service name and click “Find Traces”:

Payload from the UI to Jaeger Query

The Query’s APIHandler#search method will parse the search terms and pass it over to the storage-specific “Span Reader”. Based on the service name, our trace is then found and added to a result list. The backend sees this list as:

Results from the span reader, as seen from the Query component

All messages from the backend to the UI follow a specific format, so, this is what the UI ends up receiving:

Query backend response to the UI

The Jaeger UI will iterate over the results and nicely render the information on the screen:

In most cases, the UI won’t request the trace information again upon clicking on the trace, but if we open the trace and hit “Refresh”, it causes the UI to do a request that will reachApiHandler#getTrace.It loads the trace based on the given ID from the span storage along with all its spans, responding with a data structure similar to the following:

Result list from the span reader based on a trace ID query

Because we have only one trace, with only one span, the payload the UI receives for this request is exactly the same as the one we got from the “search” operation. But the way this data is presented differs:

The detail view of a trace

Afterlife

We’ve covered pretty much all stages of the span’s life, from its genesis up to where it’s finally used to provide insights about the instrumented application. From this point, a span might appear in several afterlife scenarios, like as a data point in Prometheus, or aggregated with other spans in a Grafana dashboard somewhere. Eventually, the storage owner might decide to cleanup older data, causing our span to cease to exist and closing the cycle.

--

--

JaegerTracing
JaegerTracing

Published in JaegerTracing

Open source distributed tracing platform at Cloud Native Computing Foundation (graduated). https://jaegertracing.io

Juraci Paixão Kröhling
Juraci Paixão Kröhling

Written by Juraci Paixão Kröhling

Juraci Paixão Kröhling is a software engineer at Grafana Labs, a maintainer on the Jaeger project, and a contributor to the OpenTelemetry project.

No responses yet