Trace my mesh (part 1/3)

Published in

Kiali

7 min readFeb 22, 2021

A Distributed Tracing walk-through with Jaeger, Istio and Kiali

You’ve heard that Istio, like other service meshes, brings automated tracing between your microservices? Fair enough, that’s correct. Still, it doesn’t do all of the job. We’re going to cover this, and see also where Kiali will help.

Wait, what is distributed tracing, anyway?

Distributed tracing is a technology to trace logical / business transactions in a running software, including transactions that spread over network calls involving different hosts or processes. Tracing a transaction means catching all events, or actions, following some causality or sequential relationships. In Tracing jargon, a single event or action is denoted as a “Span”. A “Trace” is the interrelated collection of Spans, occurring along a timeline, with parental relationships and a single root. So that’s a tree.

Spans are associated with a duration. For instance, with an HTTP call from service A to service B, service A could generate a span with a duration that stands for the network roundtrip time. Service B can also generate a span, with a duration that stands for the server processing time. Typically, these two spans would belong to the same trace. The parental relationship can be made if service A injects some headers in the HTTP request, and service B reads these headers to retrieve the trace context. This is the context propagation.

All the spans are collected and sent to a server such as Jaeger, and then can be retrieved and visualized with some tooling, like Jaeger UI itself, or Kiali.

So, Istio does all of that for me?

Yes, Istio and Envoy do automatically what I described above as an example. Because the Envoy sidecars are in a privileged position to catch all traffic between your services, and because they know well the topology of your microservices, they can do a pretty good job. When a service emits an HTTP request, the Envoy sidecar will create a span with some appropriate meta-data and inject the required headers. On inbound requests, Envoy reads the headers to extract context and, if any, creates a child span (if there were no context, a span can still be created, but without any parent, meaning it’s a new trace). So every mesh-internal request generates a two-span trace.

Note that TCP traffic isn’t traced automatically by Envoy. We will cover this point in the third and last part of this story.

Two-span trace in Kiali, but the actual business transaction is incomplete

But you want more than these two-span traces, right? You want to correlate an incoming connection of service B to an outgoing connection to service C, for instance, you want to tie two individual requests together. That’s where you come into play. Envoy cannot do that for you, because, unless it runs some complicated ML-based thingy (probably error prone by the way) it cannot guess what happens between its in and out bounds. From its point of view, this is the black box where your application stands. Whether or not a call to C was a consequence of that previous call from A is up to you to decide. Let’s see how to do that.

How to propagate the context, then?

So, this is one of the few occasions where you need to write some code to fully leverage what a service mesh can offer in terms of observability. The good news is, there’s nothing very difficult here. Even without the help of any library, you can still code the required steps, as you simply need to extract a couple of headers from the incoming request, and inject them back into the outgoing request. Envoy documents which headers are necessary to propagate depending on the tracing backend; in the case of Jaeger, use B3 trace headers, they are compatible. Another necessary header is the x-request-id created by Envoy. Which gives us:

x-b3-traceid
x-b3-spanid
x-b3-parentspanid
x-b3-sampled
x-b3-flags
x-request-id

Alternatively, all of the headers prefixed with x-b3 can be grouped into a single b3 header, more compressed. For more information about these headers, check https://github.com/openzipkin/b3-propagation.

You can see an example here, coded in go, of header propagation. As you can see, it’s quite easy to do. This is used in a demo application built for Istio and Kiali.

After you do that, you can see the Envoy generated spans linked together:

Detail of a span created by Envoy, in Jaeger UI

Using a client library

Propagating context as shown above should “just work”. However, there are some advantages to using a client library instead. Jaeger provides a bunch of them to propagate context in your favorite language, without leaving you to deal with the specification details of OpenTracing or Zipkin B3. More importantly, you might want at some point to create your own spans. Imagine that you want to measure and trace some computational task that is part of a trace initiated by Envoy. You can definitely do that.

Here is an example, this time in Java, using the Jaeger client + OpenTracing (dependency on module io.jaegertracing / jaeger-client).

The first step is to create a Tracer object, which depends on the tracing backend. For instance, with Jaeger, you can create it as follow:

Tracer tracer = Configuration.fromEnv().getTracer();

See here the environment variables used for configuration. Typically, with Istio, you need this kind of env setup on your pods:

    spec:
      containers:
      - ...
        env:
        - name: JAEGER_SERVICE_NAME
          value: myapp.mynamespace
        - name: JAEGER_SAMPLER_TYPE
          value: ratelimiting
        - name: JAEGER_SAMPLER_PARAM
          value: "1"
        - name: JAEGER_PROPAGATION
          value: b3
        - name: JAEGER_ENDPOINT
          value: http://jaeger-collector.istio-system.svc/api/traces

Ideally, you should set JAEGER_SERVICE_NAME to <app>.<namespace>, where <app> is the application name (same as the app label set on pods) and <namespace> the namespace where you deploy it. This is not exactly mandatory, but it will be consistent with how Envoy generates spans, and will help Kiali to correlate the spans with the appropriate apps / workloads / services.
JAEGER_SAMPLER_TYPE and JAEGER_SAMPLER_PARAM define the sampling strategy. Here, we define a simple rule to keep only 1 percent of the spans. But keep in mind that only the root span of a trace can define the sampler used across subsequent spans, so, perhaps that is something you want to configure in Envoy as well. To learn more about sampling, I warmly recommend reading this article from Juraci Paixão Kröhling: The role of sampling in distributed tracing.
JAEGER_ENDPOINT must point to the Jaeger collector service. Beware that it’s not always exposed by default in Istio, you may have to create a service to expose it, generally targeting port 14268:

apiVersion: v1  
kind: Service  
metadata:  
  name: jaeger-collector  
  namespace: istio-system  
  labels:  
    app: jaeger  
spec:  
  type: ClusterIP  
  ports:  
    - name: http-query  
      port: 80  
      protocol: TCP  
      targetPort: 14268  
  selector:  
    app: jaeger

Now, here is how to extract a SpanContext from headers:

    SpanContext parentContext =
       tracer.extract(Format.Builtin.HTTP_HEADERS,
                      new TextMap() {
         @Override
         public Iterator<Map.Entry<String, String>> iterator() {
           return headers.iterator();
         }
         @Override
         public void put(String key, String value) {
           throw new UnsupportedOperationException();
         }
       });

And to inject it back:

    tracer.inject(spanContext,
                  Format.Builtin.HTTP_HEADERS,
                  new TextMap() {
          @Override
          public Iterator<Map.Entry<String, String>> iterator() {
            throw new UnsupportedOperationException();
          }
          @Override
          public void put(String key, String value) {
            headers.accept(key, value);
          }
        });

These two snippets give an idea, they may have to be adapted to whatever headers object you can manipulate. For more examples in different languages, check the OpenTracing guides.

Depending on your framework, perhaps there’s already some mechanisms in place to inject / extract spans and make them accessible to your app, which saves you from doing this. For instance, in this demo application using Vert.X, the vertx-opentracing module extracts and injects spans automatically, and provides a getter and setter to the active span stored in a local context that makes it easy to propagate across a stack of calls.

Finally, you can use the OpenTracing client API to create your own spans:

Span span = tracer.buildSpan("Doing something")  
      .withTag("foo", foo)  
      .withTag("bar", bar)  
      .asChildOf(parentContext)  
      .start();
    span.log("Log any relevant info throughout span lifetime");
    // ... Do something
    span.finish();

Use tags to enrich the span with any meta-data you find relevant, and logs for even more information as you would do with a usual logger.

Once the traces propagation is set up, Tracing in Kiali takes another dimension.

Traces here are bigger, involving a mix of Envoy and custom spans. You can have hundreds of spans in a trace without a problem.

In the next parts, we will review more in detail the capabilities that Kiali offers on Tracing, and also cover some more atypical scenarios. In the meantime, you can reach out on our Slack channel, mailing list, Twitter (me) or, last but not least, IRC.

Trace my mesh (part 1/3)

Wait, what is distributed tracing, anyway?

So, Istio does all of that for me?

How to propagate the context, then?

Using a client library

Written by Joel Takvorian