Post-processing use cases for the data pipeline

Published in

JaegerTracing

2 min readMay 25, 2018

One of the items on the roadmap for Jaeger is the Data Pipeline, where Jaeger would be able to feed post-collection data into a system like Flink for further processing. This seems to be quite a popular feature and we want to collect more information about the sorts of use cases that you would like to see covered by this feature.

For the moment we’re not asking for specifics about technology, we want to focus on the business case: what types of post-processing jobs would you like to see as part of Jaeger? What kind of information are you missing that could be provided by post-processing tasks? And what are the highest priority needs that could be met by this feature? Share your own ideas with us via Gitter, mailing list or as a reaction to this blog post.

Path-based dependency diagrams

Path based dependency diagrams are part of the roadmap and are described in detail there.

Long-term storage

Jaeger backend storage (Cassandra/Elasticsearch) is not intended for long-term storage. There are even some scripts we provide to cleanup old data. But some of this data could be useful and be eligible for long-term storage. A consumer to the pipeline could filter the interesting spans and send to a long-term storage solution of choice.

Logging

Span data could be filtered and sent to a log collector like Fluentd, possibly for correlation with systems that are not being traced (yet!).

System-wide critical paths

Probably another view of the path-based dependency diagram, but identifying what are the critical paths on a given system. A critical path might be a frequently requested path where a failure in a given service would be propagated to the caller.

Suggestions for resiliency improvement

Similar to the above, a post-processing job could identify services whose downstream failures are not properly contained, suggesting that its resiliency could be improved.

Biggest bang for the buck

Identify “critical” spans, whose performance improvements would potentially improve the biggest number of upstream services.

Resource contention detection

Some trace patterns indicate resource contentions, like when several spans start at the same time. Hot Rod contains an example of such contention. Another item that might be relatively easy to find out is something similar to the “N+1” problem common in the SQL world.

Finding stability outliers

A “stable” span is a span that usually takes the same amount of time, within a given standard deviation. Outliers could potentially be found by filtering which spans are outside of this window (too fast, too slow).

Span trend view

A trend view could be built for span instances, showing how the performance of a given span was in the past.