Adaptive Sampling in Jaeger
In collaboration with Joe Elliott.
In distributed tracing, sampling is frequently used to reduce the number of traces that are collected and stored in the backend. This is often desirable because it is easy to produce more data than can be efficiently stored and queried. Sampling allows us to store only a subset of the total traces produced.
Traditionally Jaeger SDKs have supported a variety of sampling techniques. Our favorite has always been so called remote sampling, a feature pioneered in open source by the Jaeger project. In this setup, the Jaeger SDKs would query the Jaeger backend to retrieve a configuration of sampling rules for the given service, up to the granularity of individual endpoints. This can be a very powerful method of sampling as it can give operators central control of sampling rates across an entire organization.
Until recently, the only way to control which sampling rules are returned by the backend in the remote sampling mode was with a configuration file provided to the collector via the
--sampling.strategies-file flag. Usually, the operators must manually update this file to push out different sampling rules. Adaptive sampling, added in v1.27.0, allows the collectors to automatically adjust sampling rates to meet preconfigured goals, by observing the current traffic in the system and the number of traces collected. This feature has been in production at Uber for several years, and is finally available on the open source version of Jaeger. A special shout-out to Joe Elliott for completing the adoption of the code contributed by Uber, which was sitting in the Jaeger repo for two years without being wired into the collector’s
Why do we need remote & adaptive sampling?
It is always possible to configure the SDK to apply a very simple sampling strategy like a coin-flip decision, also known as probabilistic sampling. This might work fine in a small application, but when your architecture is measured in 100s or even 1000s of services, which all have different volumes of traffic, a single sampling probability for every service does not work that well, and configuring it individually for each service is a deployment nightmare. Remote sampling addresses this problem by centralizing all sampling configuration in the Jaeger collectors, where changes can be pushed out quickly to any service.
However, configuring sampling rules for every service manually, even if centrally, is still very tedious. Adaptive sampling takes this a step further and transforms this into a declarative configuration, where the operator only needs to set the target rate of trace collection, and the adaptive sampling engine dynamically adjusts the sampling rates individually for each service and each endpoint.
Another benefit of adaptive sampling is that it can automatically react to changes in the traffic. Many online services exhibit fluctuations in traffic during the day, e.g. Uber would have higher volume of requests during peak hours. Adaptive sampling engine would automatically adjust the sampling rates to keep the volume of trace data stable and within our sampling budget.
How to set up adaptive sampling?
First, adaptive sampling requires that the Jaeger SDKs reach out to the backend to request the remote sampling document. This can be configured using environment variables. Please refer to the client features documentation to confirm that these are supported by your Jaeger client.
JAEGER_SAMPLING_ENDPOINT=<sampling endpoint on the jaeger agent>
The defaults are generally set up to work with a local Jaeger agent, running as a host agent or a sidecar, so it’s possible your setup is already close to working. Jaeger SDK configuration actually defaults to this:
After your clients are configured you will need to make sure that your collectors are configured correctly to store adaptive sampling information. Currently, Jaeger uses the same storage for adaptive sampling as span storage and the only supported storage options for adaptive sampling are
cassandra (since v.1.27) and
memory (since v1.28). Using the environment variables to configure your collector may look something like:
If you’re just getting started, we encourage you to check out this simple docker-compose example which starts up Jaeger in a configuration that supports adaptive sampling.
The adaptive sampling algorithm can be tuned with a number of parameters that can be found in the documentation, or form the
$ docker run --rm \
-e SAMPLING_CONFIG_TYPE=adaptive \
help | grep -e '--sampling.' --sampling.aggregation-buckets int Amount of historical data to keep in memory. (default 10)
--sampling.buckets-for-calculation int This determines how much of the previous data is used in calculating the weighted QPS, ie. if BucketsForCalculation is 1, only the most recent data will be used in calculating the weighted QPS. (default 1)
--sampling.calculation-interval duration How often new sampling probabilities are calculated. Recommended to be greater than the polling interval of your clients. (default 1m0s)
--sampling.delay duration Determines how far back the most recent state is. Use this if you want to add some buffer time for the aggregation to finish. (default 2m0s)
--sampling.delta-tolerance float The acceptable amount of deviation between the observed samples-per-second and the desired (target) samples-per-second, expressed as a ratio. (default 0.3)
--sampling.follower-lease-refresh-interval duration The duration to sleep if this processor is a follower. (default 1m0s)
--sampling.initial-sampling-probability float The initial sampling probability for all new operations. (default 0.001)
--sampling.leader-lease-refresh-interval duration The duration to sleep if this processor is elected leader before attempting to renew the lease on the leader lock. This should be less than follower-lease-refresh-interval to reduce lock thrashing. (default 5s)
--sampling.min-samples-per-second float The minimum number of traces that are sampled per second. (default 0.016666666666666666)
--sampling.min-sampling-probability float The minimum sampling probability for all operations. (default 1e-05)
--sampling.target-samples-per-second float The the global target rate of samples per operation. (default 1)
How does adaptive sampling work?
We start with some default sampling probability
p assigned to every endpoint and a target rate
R of traces we want to collect, such as 1 trace per second per endpoint. The collectors monitor the spans passing through them, looking for root spans of the traces started with this sampling policy, and calculate the actual rate of traces
R’ being collected. If
R’ > R then our current probability for this endpoint is too high and needs to be reduced. Conversely, if
R’ < R then we need to increase the probability. Since the actual traffic is always a bit noisy, the situation where
R’ == R rarely occurs, so the collector uses a certain tolerance threshold
k such that the above rules are actually
R’ > R + k and
R’ < R — k. Once the new probability
p’ is calculated, the collector waits for a certain time interval to make sure it was retrieved by the SDKs and applied to new traces, then observes a new value of rate
R’ and repeats the cycle. Yuri Shkuro’s book Mastering Distributed Tracing contains a more detailed description of the math involved in the adaptive probability calculations implemented in the Jaeger collectors.
We also need to talk about how this is all done given that Jaeger allows us to run multiple collectors simultaneously. The adaptive sampling module implements a simple leader election mechanism using compare-and-swap operations supported by the storage backends. Each collector receives a distinct stream of spans from the services (remember, we’re only interested in the root spans since that is where the sampling decision always happens), and maintains an in-memory aggregate of trace counts for each service / endpoint pair. Then after a certain time interval each collector writes this data (referred to as throughput in the code) to the storage backend. Then the collector that won the leader election reads all throughput data from storage for a given time range, aggregates it, performs the probability calculations, and writes the new probabilities summary for all services back to storage. The other collectors load that summary and use it to serve the requests for sampling strategies from the SDKs. Note that the leader election in this model is purely an optimization, because the sampling summary is written under a stable time-based key known to all collectors, so if more than one collector happens to perform the calculation of the probabilities, they would just override each other’s writes with the same data.
There are a few features on our wishlist that would make adaptive sampling better. One is the ability to account for the total number of spans instead of the total number of traces. Different endpoints can result in very different sizes of traces, even by several orders of magnitude. Yet the current implementation is only built around counting traces. It can be extended with additional heuristics, such as calculating the average trace size per endpoint offline and providing the adaptive sampling engine with a weights matrix to take into account when computing actual throughput.
Another nice-to-have feature, which actually requires changes in the remote sampling configuration, is to use other dimensions from trace data besides service name and endpoint name that are currently hardcoded in the schema.
And yet another useful extension would be a configuration mechanism to allow overriding the target throughput rate
R for specific services / endpoints, instead of using a single global parameter, because some services may be more important to your business and you might want to collect more data for them, or perhaps this could be a temporary setting due to some investigation.
We are pleased to release the first open source end-to-end implementation of adaptive sampling for the Jaeger community. Please give it a shot, provide feedback and let us continue iterating on this feature.
If you’re interested in contributing to the future development of this feature, there are a couple of areas where we could use some help immediately:
- Supporting ElasticSearch / OpenSearch as the backend for storing adaptive sampling data.
- Decoupling Jaeger storage configuration so that different storage backends could be used for span storage and adaptive sampling.