A continued case study of Project Horus.
There are 3 ways of collecting data for observing microservices: Logging, Metrics, and Distributed Tracing¹. In a previous article, I discussed logging using an example application Project Horus²—here we will setup Distributed Tracing for this app.
We will start with basic theory and ultimately set up a complete Tracing Pipeline for observing two Node.js microservices running in a Docker environment. The core principles are platform-agnostic and applicable to other environments like heterogeneous services on Kubernetes.
What is Distributed Tracing?
Distributed Tracing is about understanding the path of data as it propagates through the components of our application. While Logs can record important checkpoints when servicing a request, a Trace connects all these checkpoints into a complete route that explains how that request was handled across all services from start to finish.
Indeed Tracing is a superset of Logging—you could enrich a Trace by annotating it with logs of events that occurred during its course.
But tracking requests across independent services presents unique challenges, especially at scale. We need a solution that is portable, simple to implement, and with little performance overhead. The OpenTracing project was created to reason about solutions to some of these challenges.³
OpenTracing and OpenCensus (a similar project) recently merged to form OpenTelemetry—An exciting project that promises to further simplify observability in cloud-native applications.
The terminology of OpenTracing
The above image shows various perspectives of our sample application: an app that enables a user to request weather information about their location. First the request is handled by an api-service which translates the user’s IP address to a city name using a 3rd party ip-service. Then, it obtains the latest weather information for that city from another 3rd party weather-service. Finally, the weather information is returned to the user. See Fig 1 and 2.
A Transaction is an end-to-end request-response flow, i.e from making the user’s initial request to receiving the final weather response. A transaction often involves the interaction of multiple services — Fig 2 shows 4 services collaborating to enable one transaction.
A Trace is the record of a Transaction. It captures the work done by each service as a collection of Spans all sharing the same Trace ID. More granular operations of a service can be captured as Children Spans which have a childOf reference pointing to their parent Span. Hence the tuple
(TraceID, SpanID, ParentID) sufficiently describes a Span’s position in a Trace so this is called the SpanContext.
Fig 3 shows a Trace with 4 Spans, each with a SpanContext that effectively connects the Spans to form a Directed Acyclic Graph. That is why a Trace is also referred to as a DAG of Spans (as shown in Fig 4).
To propagate a Trace from one service to another, you inject the SpanContext as HTTP request headers and extract them in the receiving service, then you can continue the Trace by creating Children Spans that reference the propagated SpanContext.
The OpenTracing Specification provides a more technical description of these concepts,⁴ but the above overview is sufficient to proceed with setting up the pipeline.
Setting up a Tracing Pipeline
A Tracing Pipeline involves 3 steps: acquiring a trace, storing the trace, and visualizing the trace. You can either use a SaaS solution (top half of image) or setup each component of the pipeline yourself (bottom half). Both approaches involve instrumenting your services with a client library that provides a Tracer object with which you can create Spans. The library then sends the tracing data to either an agent or an endpoint for storage in a database. Finally, a Query UI is used to access and visualize the traces.
The SaaS approach is obviously simpler and eliminates the complexity of managing all 3 components of the pipeline, but I have chosen the self-managed route (highlighted in green)…just so we can explore how everything works under the hood.
Step 1: Acquiring Traces using the Jaeger Client
There are a couple of ways to instrument the code as shown by the image below. But I’ve chosen the path highlighted in green.
First, we import the Tracer object provided by jaeger-client in our service.
Notice how the Tracer is initialized from environment variables. This is a best practice that will allow us to easily change a service’s tracing behaviour from a higher level like a Docker Compose file.
Then we use the Tracer object to create Spans whenever we want to record a request. Notice that since we want to continue the Trace across other services, the Span’s SpanContext is serialized as HTTP request headers using the helper function
getCarrier(), and propagated along the request.
On the receiving service, we check that if the incoming request is already being traced (i.e has trace headers), then continue the trace by creating a Children Spans referencing the incoming SpanContext, else start a root Span. This is handled with the helper function
And that’s how we trace requests across multiple services! However, you’d quickly find that this manual Span creation and propagation is verbose and quickly becomes a maintenance headache. So, we could write custom middleware that will automatically inject/extract trace information. But that’s an optional optimization. Let’s now consider how to save and visualize these traces.
Step 2 & 3: Storing and visualizing traces with the Jaeger tracing-backend.
We collect, store, and query the traces using a couple of components collectively referred to as the tracing-backend. The image below shows the 4 components of the Jaeger tracing-backend.
- jaeger-agent typically runs as a daemon or sidecar container alongside an instrumented service. The agent receives traces from the jaeger-client over UDP and forwards them to the jaeger-collector. An agent is not strictly necessary because we can send traces directly to the collector. However, using an agent helps abstract away batching and collector discovery logic from the jaeger-client.⁹
- jaeger-collector handles the logic of storing traces in persistent storage. It supports a couple of storage-backends for interfacing with various database types like Cassandra, Elasticsearch, Kafka, or plain memory.¹⁰
- Elasticsearch is a high-performant database based on Apache Lucene that supports indexing and searching of massive datasets with near-real-time responsiveness.¹¹
- Jaeger Query and UI is the final piece of the puzzle, we use it to search and visualize our trace data.¹²
There are many ways you could run these components. For instance, the jaegertracing/all-in-one¹³ image enables you to run the entire tracing backend in a single container (traces will be stored in memory). However, in production, you’d want to run each component as a separate service for better scalability. The snippet below shows my abridged docker-compose.yml file, where I’ve chosen to run each component as a separate service.
An important note on service dependencies in Docker.
Some services depend on another service to be ready before they can start. In our system diagram above the arrows show that jaeger-collector depends on elasticsearch, the former is the depender while the latter is the dependee. Readiness not only means that the dependee is started, but that it is ready to receive inputs.
Docker Compose lets you specify service dependencies using the
depends_on attribute but this only ensures that the dependee is started before the depender,¹⁴ it has no notion of whether the dependee is ready to accept input. In Compose v2.1, you could depend on a
healthcheck attribute to solve this problem, but this fix was removed in v3.0 - for reasons I disagree with.¹⁵
Nevertheless, Docker now recommends you write your own script that probes for when your dependee is ready before running the depender. This introduces some additional complexities like: instead of using stock images like jaegertracing/jaeger-collector, you’d need to create a custom image with an entrypoint script which probes elasticsearch for readiness before executing the jaeger-collector binary.
These kind of implementation gotchas can be a real pain, and why you might consider using a TaaS provider like mentioned earlier.
On trace sampling
Although tracing clients are designed to have very little overhead, tracing every request may have a significant impact on large production systems handling billions of requests. So, most solutions use a sampling strategy to choose what traces are recorded. For instance, the jaeger-client can be configured to record: all traces, a percentage of traces, or an adaptive number tailored to specific service requirements.¹⁶
However, Lightstep promises 100% trace capturing (even for very large systems) with no significant overhead.¹⁷ I have no idea how they achieve this, but it would be interesting to explore.
Finally visualizing traces
If the above setup was correctly done, you should be able to visit the jaeger-query UI endpoint and explore the traces stored in elasticsearch. The image above shows a trace of the
whereami-request from the user, through all our services, and back. You can immediately infer that a bulk of the transaction time is spent in the
get-weather request, meaning we can significantly improve our overall response time by optimizing this one request. Indeed you can do really sophisticated analyses like machine-learning-based anomaly detection, but that is beyond the scope of this article :)
In this article, we examined the theory and practice of observing microservices by gathering and visualizing distributed tracing data. To do this we studied the concepts of OpenTracing and implemented a Tracing Pipeline using the Jeager tracing backend.
One key lesson is that implementing distributed tracing for microservices is nontrivial, especially if you’re managing every component in the pipeline — the increased complexity creates more room for things to go wrong. So, I’d strongly encourage using a SaaS solution like Lightstep or Elastic.
Nevertheless, our jaeger-based solution is sufficient for most use cases, and the implementation can further be simplified by using Kubernetes which has native support for liveness and readiness probes,¹⁸ hence making custom wait-scripts unnecessary.
The complete project code is available on Github¹⁹, and I’ll keep updating this article as I learn better ways of doing things. Please leave a comment if you have any questions or notice any errors. Hope you found it useful.