Trace my mesh (part 3/3)

Published in

Kiali

4 min readMar 9, 2021

A Distributed Tracing walk-through with Jaeger, Istio and Kiali

In the previous parts, we’ve seen how Istio and Envoy help on tracing, how to propagate traces, create spans and visualize data in Kiali. Now we’re going to explore some different scenarios.

Atypical topologies

Let’s talk about Kiali graph topology. It represents all interactions between services, that is to say, all service-to-service interactions within a mesh and at its boundaries. We know how useful it is, however there are some dead angles: for instance, what if we have some microservices that don’t rely on service-to-service communications? Like if we use event-based communications with a message broker, such as Kafka? Services are talking one to another via Kafka. So the Kiali graph, which considers sources and destinations of all requests (or TCP connections), will have the shape of a star, or a hub, with Kafka in the middle and all other services just linked to Kafka. This is the actual network topology, and it’s still valuable as is, but it doesn’t tell anymore who talks to who in a more abstract way, it misses the business relationships. More frequently, architectures don’t use event-based or service-to-service communication in a mutually exclusive way, but instead combine the approaches. There are some use cases where event-based makes more sense, others where it doesn’t, but the two of them can live side by side in a mesh.

This is a mixed topology: the green arrows show classical service-to-service communications; but all the blue arrows point to Kafka, which is hiding how the information flows there.

You guessed it, tracing will fill the observation gap here. The trace overlay on top of the Kiali graph aims to show how a business transaction flows between services, should it go through a message broker or anything else.

Tracing shows evidence that “stadium” is talking to “ui”; by looking at the trace detail, we can know that “stadium” is the publisher, and “ui” the consumer.

There is a subtlety however. We haven’t covered this so far, but Envoy injects traces only on HTTP requests. All TCP, non-HTTP traffic stays uncaught from tracing unless — once again — we intervene.

Capturing TCP traffic in traces (e.g. DB, Kafka)

TCP traffic can be pretty much anything, from database querying to publisher/subscriber communication or direct peer-to-peer communication, etc. Unlike HTTP with headers, there is no standardized way to send the tracing meta-data across TCP communications, which is why Envoy won’t trace it. This would be something to implement protocol by protocol as filters, whenever possible. Or something to deal with in the application code, which we’re going to see now.

We’ve already covered how to create a span in Part 1 of this story, chapter “Using a client library”. This is exactly what needs to be done again, when the application is about to connect to a remote host via TCP, or to send a Kafka message. But this time, we will take a look at the OpenTracing semantic conventions: they define a bunch of relevant tags to use here. Some are relative to database access. message_bus.destination can hold a Kafka topic. span.kind, at this point, should be client, or producer in the case of Kafka. And from Kiali perspective, the peer.address (or alternatively peer.hostname) is the most useful as it will be used to correlate the span with a service / service entry in the Kiali graph (it can be any of the hosts defined in a Service Entry, like Kafka brokers or bootstrap servers).

So, you can create a span with these tags, without forgetting to pass the previous span context as a parent, if relevant.

// E.g. for database access:
Span span = tracer.buildSpan("Query customers")
      .withTag("db.instance", "xxx")
      .withTag("db.type", "sql")
      .withTag("peer.address", "mydb.example.com")
      .withTag("peer.service", "postgresql")
      .withTag("span.kind", "client")
      .asChildOf(parentContext)
      .start();
// ... Run query
span.finish();

For some TCP traffic such as database queries, that’s pretty much it, we are not going to propagate context any further.

For things like Kafka pub/sub, we’re not done. Kafka has the concept of headers to carry meta-data, so we’re going to follow the same process as we’ve seen before to propagate B3 headers over HTTP, except it’s now going to fill the Kafka headers. Just copy the B3 key/value strings to whatever structure your Kafka client accepts, or reuse a similar tracer.inject function as we’ve seen previously, adapted to your Kafka client headers interface. Note that the Format.Builtin.HTTP_HEADERS format is fine to reuse, despite its name. You can also check the other formats (binary, …).

And finally, you can finish the process by extracting headers to a span context on the subscriber side, still inspired with the previously seen functions (tracer.extract), and create a child span similar to the previous one, except that span.kind is now consumer.

SpanContext parentContext = tracer.extract(...);
Span span = tracer.buildSpan("Received message")
      .withTag("message_bus.destination", "my-topic")
      .withTag("peer.address", "kafka.example.com")
      .withTag("peer.service", "kafka")
      .withTag("span.kind", "consumer")
      .asChildOf(parentContext)
      .start();
// Process received message
span.finish();

As for tracing over HTTP, some frameworks or clients do provide out of the box Kafka tracing, which will save you from doing all of this. The steps described here can be seen more or less, among other considerations, in this pull request that added tracing support to the vertx-kafka-client, for instance. But should you be confronted with another protocol, another technology that doesn’t come with a full-featured client supporting tracing, it’s good to know how it can still be achieved.

One caveat is to be able to propagate meta-data. If it’s not through headers, it might work with some custom payloads. Again, it may vary depending on protocols.

traceMyMesh.finish()

This is the end of this “Trace my mesh” story. Got feedback, questions or just want to talk? We’re on Slack, mailing list, Twitter (me) and the mighty IRC.

Big thanks to the Jaeger guys Gary, Juca and Pavol — and my team-mates on Kiali — who helped me whenever I needed!

Trace my mesh (part 3/3)

Atypical topologies

Capturing TCP traffic in traces (e.g. DB, Kafka)

traceMyMesh.finish()

Written by Joel Takvorian