Trace comparisons arrive in Jaeger 1.7

Joe Farro
Published in
7 min readOct 1, 2018


Distributed tracing can capture a wealth of data and has the potential to yield invaluable insights. But, bridging the gap between the ocean of spans and the epiphanies they offer is not yet a solved problem.

As of Jaeger 1.7, the ability to compare traces is now available in the Jaeger UI. We are hopeful trace comparisons will facilitate going from data to decisions.


The trace comparison feature enables the structural aspects of two traces to be compared. Each trace is condensed into a tree of unique service / operation paths. Then, differences in the presence or prevalence of nodes in the trees are emphasized with color-coding. For instance, when comparing two traces which are essentially RPC call-graphs, the comparison highlights which call sequences are present in only one of the traces or occur more frequently in one of the traces.

In the screenshot above, the dark red nodes are nodes which happened in trace A but do not exist in trace B. We can see very clearly that trace B is primarily a subset of trace A (more on that, below).

In addition to emphasizing spans which are present in only one of the traces, trace comparisons also indicate when spans are repeated more often in one trace or the other. For instance, in the comparison of two traces from the HotROD example, below, we can see the node for redis::GetDriver is light green and marked with +1 and +8%. This lets us know the span is present in both traces but it is seen more often in trace B.

The color-coding of the differences is inspired by code review tools. When comparing trace A to trace B, we evaluate the differences as if we wanted to change trace A to have the same structure as trace B: what would we need to remove from trace A or add to trace A in order to achieve an equivalent structure with trace B.

Dark colors indicate the node is missing from one of the traces:

  • Dark red: The node is only in trace A and would need to be **removed** in order to match B
  • Dark green: The node is only in trace B and would need to be **added** in order to match B

Light colors indicate the node is present in both traces but more-so in one or the other:

  • Light red: The node in A has more spans than B
  • Light green: The node in B has more spans than A

Gray indicates the two traces both have that node and have the same number of spans grouped into it.

Now that we know what, how about why?

The pattern where one trace is a subset of another trace, and the missing portion is non-trivial, is typical when comparing a request which was processed successfully against a request which was interrupted by an error.

Such a comparison can offer a very timely and relevant insight when investigating an incident. We can quickly and confidently narrow our search space — sometimes to a much smaller area of interest.

Assuming that our error trace is representative of the incident (and, ideally, we would have several traces to draw from), we can infer that investigating the red nodes is almost certainly not the best use of our time because they are not even visited in the error trace.

Meanwhile, it’s possible the red nodes represent a substantial number of monitored resources which, as part of the incident, may be experiencing a severe drop in load. Thus, the unvisited nodes just might be slamming us with alerts (relatively speaking). Under such circumstances, it’s not unfathomable to think: “OMG! These alerts are from services that aren’t even in the [error] traces we’re digging into! WTF! FML!” TLDR: Not good.

Ordinarily, focusing on the error trace(s) when we’re being barraged by alerts from disparate sources that aren’t present in our error traces requires two or more of the following traits:

  • Nerves of adamantium
  • A venerable lineage of prior incidents to draw from
  • A zen-like relationship with entropy and adrenaline

Or, we can lean on our insights from the trace comparison to transcend the clamor.

Disclaimer: My excitement for the trace comparison feature (and my penchant for drama) may have gotten the better of me, here. Please understand the above is a dramatic re-enactment of a hypothetical outage scenario.

How? — Group, match and compare

In the comparison of the HotROD traces above, you may have noticed there are far fewer nodes in the comparison graph than spans in the traces (8 vs 50).

The comparison feature first reduces a trace to a more compact graph representation. This is done by collecting spans into groups, and representing the trace as a DAG (directed acyclic graph) of these groups instead of as a DAG of individual spans. Then, it’s these groups of spans which are matched between traces and compared.

Why collect spans into groups?

In our experience, grouping “similar” spans facilitates analysis by making trace comparisons more comprehensible, in some cases dramatically so.

As a baseline, if we were to compare two traces in their raw form, our comparison would be based on spans being common to both traces or being only in trace A or only in trace B.

Where does that leave us? Let’s take a look at three traces, each rooted at ROOT_SVC.

Trace Alpha has one child span:


Trace Beta has a root span with three children:


Trace Gamma also has a root spans with three children:


With our baseline comparison, comparing Alpha vs Beta and comparing Alpha vs Gamma would result in a graph where ROOT_SVC has a gray child-node for the CHILD_SVC span shared between the two traces and two dark green nodes indicating those spans are not in trace A.

Naturally, there is validity to this result, but it can be very verbose and does not indicate Alpha and Beta actually have a lot more in common than Alpha and Gamma. We can do better.

The three CHILD_SVC nodes in trace Beta do have common ground: They represent the same logical element (CHILD_SVC, in our example) and they have the same parent. We can leverage this by collecting the three CHILD_SVC spans in trace Beta into a group. The impact of this grouping is two-fold:

  • It allows us to simplify our representation of the comparison
  • It visually encodes the similarity of spans by grouping them

When we apply this grouping to the Alpha vs Beta comparison, we end up with a single child which we color light green to indicate the node has a heavier presence in Beta than Alpha. Our decision to group nodes has no impact on the Alpha vs Gamma comparison.

The new representation of Alpha vs Beta ends up being much simpler than the Alpha vs Gamma comparison, which makes intuitive sense because Gamma is a more diverse set of nodes than either Alpha or Beta.

Our definition of “similar”

We define two spans to be similar, which is to say we put them in the same group, if and only if:

  • They have the same ordered set of ancestors as defined by the service and operation of the ancestors
  • They are both either parents or leaves

For example:


Would reduce to the following, with the (3) indicating the CHILD_SVC node is a group of three spans:


And, the following trace would not reduce at all:


While this definition of “similar” has both pros and cons, and is essentially arbitrary, we have found it to be very practical.

Comparing groups

Our baseline comparison allowed us to compare traces based only on whether a span is “present” or “absent.” That’s to say, whether a span should be inserted into trace A or removed from trace B to achieve an equivalent structure between the two traces. We can continue to apply this relationship if we apply it to groups of spans instead of individual spans.

Further, when a group of spans is present in both traces, we can compare the number of spans in each trace’s group.

Trace comparisons are now a collection of the following relationships between groups of spans:

  • The group is only in trace A (dark red)
  • The group is only in trace B (dark green)
  • The group is in both traces but has more spans in trace A (light red)
  • The group is in both traces but has more spans in trace B (light green)
  • The group is in both traces and contains the same number of spans (gray)

Future work

We are interested in the following enhancements:

  • Comparison of span attributes , such as the average duration of spans in a group, rather than the number of spans in the group
  • Introducing the ability to adjust the coarseness of how groups are establish — see jaeger-ui#252