Distributed Tracing Infrastructure with Jaeger on Kubernetes

One cannot overstate the importance of monitoring infrastructure as an integral component of distributed systems (or, any system for that matter). Monitoring is not only subject to tracking the binary “up” and “down” pattern but being involved in complex system behavior. Monitoring infrastructure can be setup to give insights into performance, system health and behavioral patterns over time.

This post goes over one aspect of a monitoring infrastructure — distributed tracing.

Observability in a Microservices architecture

Kubernetes has become the de-facto orchestrator for microservices infrastructure and deployment. The ecosystem is extremely rich and one of the fastest growing in the open-source community. A monitoring infrastructure with Prometheus, ElasticSearch, Grafana, Envoy/Consul, Jaeger/Zipkin make up a solid foundation to enable metrics, logging, dashboards, service-discovery and distributed tracing across the stack.

Distributed Tracing

Distributed tracing enables capturing requests and building a view of the entire chain of calls made all the way from user requests to interactions between hundreds of services. It also enables instrumentation of application latency (how long each request took), tracking the lifecycle of network calls (HTTP, RPC, etc) and also identify performance issues by getting visibility on bottlenecks.


The following sections goes over enabling distributed tracing with Jaeger for gRPC services in a Kubernetes setup. Jaeger Github Org has dedicated repo for various deployment configurations of Jaeger in Kubernetes. These are excellent examples to build on, and I will try to break down each Jaeger component and its Kubernetes deployment.

Jaeger Components

Jaeger is an open-source distributed tracing system that implements the OpenTracing specification. Jaeger includes components to store, visualize and filter traces.

Architecture

Jaeger Client

Application tracing instrumentation starts with the Jaeger client. The following example uses the Jaeger Go library to initialize tracer configuration from environment variables and enable client-side metrics.

The Go client makes it simple to initialize the Jaeger configuration via environment variables. Some of the important environment variables to set include JAEGER_SERVICE_NAME, JAEGER_AGENT_HOST and JAEGER_AGENT_PORT. Full list of environment variables supported by Jaeger Go Client is listed here.

To add tracing to your gRPC microservices, we will use gRPC middleware to enable tracing on gRPC server and client. grpc-ecosystem/go-grpc-middleware has a great collection of interceptors, including support for OpenTracing provider-agnostic server-side and client-side interceptors.

Initialize a gRPC server with server-side chaining of unary and stream interceptors. The grpc_opentracing package exposes the opentracing interceptors that can be initialized with any opentracing.Tracer implementation. Here we initialize with the Jaeger tracer. Enabling this will create a root serverSpan and for each server-side gRPC request, the tracer will attach a Span to every RPC call defined in the service.

In order to enable tracing of both upstream and downstream requests of the gRPC service, the gRPC client must also be initialized with client-side opentracing interceptor as well, as shown in the following example:

The parent spans created by the gRPC middleware are injected to the go context, which enables powerful tracing support. The opentracing go client can be used to attach child spans to parent context for more granular tracing, as well as control each span lifetime, add custom tags to traces, etc.

Jaeger Agent

The Jaeger agent is a daemon that receives spans from Jaeger clients over UDP, batches and forwards them to the collectors. The agent acts as a buffer to abstract out batch processing and routing from the clients.

Even though the agent is built as a daemon, in a Kubernetes setup the daemon can be configured to run as a sidecar container in the application Pod or as an independent DaemonSet workload.

There are pros and cons to both deployment strategies and the following shows the setup and differences between the two strategies.

Jaeger Agent as Sidecar

A sidecar Jaeger agent is a container that sits in the same pod as your application container. The application, denoted as Jaeger service myapp will send Jaeger spans to the agent over localhost to port 6381. These configurations are set via environment variables JAEGER_SERVICE_NAME, JAEGER_AGENT_HOST and JAEGER_AGENT_PORT in the client as mentioned earlier.

With this approach, each agent (and thus each application) can be configured to send traces to different collectors (and thus, different backend storages).

However, one of biggest drawbacks of this approach is tight-coupling of life-cycle of the agent and the application. Tracing is meant to give insights to your application during its life-time. More likely than not, the agent sidecar container is killed before main application container, and any/all the important traces during shutdown of the application service will go missing. The loss of these traces could be significant in understanding application life-cycle behavior of complex service interactions. This Github issue verifies the need for proper SIGTERM handling during shutdown.

Jaeger Agent as Daemonset

The other approach is to run the agent as a daemon in each Node in the cluster, via a DaemonSet workload in Kubernetes. A DaemonSet workload ensures as Nodes are scaled, a copy of the DaemonSet Pod is scaled with it.

In this scenario, each agent daemon is responsible for consuming traces from all running applications (with Jaeger client configured) scheduled in its Node. This is configured by setting the JAEGER_AGENT_HOST in the client to point to the IP of the agent in the Node. The Agent DaemonSet is configured with hostNetwork: true and appropriate DNS policy so that the Pod uses the same IP as the host. Since the Agent port 6831 is exposed to accept jaeger.thrift messages over UDP, the daemon’s Pod is configured port is bound with hostPort: 6831 as well.

One may be tempted (as I was), to front the DaemonSet with a Kubernetes service. The idea behind that was to not bind application traces to the single agent in the current Node. Using a service enables spreading the workload (spans) across all agents in the cluster. This in theory, reduces the chance of missing spans from application instances in the event of a failure of a single agent pod of the affected Node.

However, this will not work as your application scales and high load creates large spikes in the number of traces that need to be processed. Using a Kubernetes service means sending traces from client to agent over the network. Very soon, I started noticing high number of dropped spans. The client sends spans to agent over UDP thrift protocol and a large spike resulted in exceeding the UDP max packet size and thus dropped packets.

The solution is to allocate resources appropriately such that pods are scheduled by Kubernetes more evenly across the cluster. One could increase the queue-size of the client (set JAEGER_REPORTER_MAX_QUEUE_SIZE environment variable) to give enough buffer for spans while an agent fails over. It would be beneficial to increase the internal queue size of agent as well (set processor.jaeger-binary.server-queue-size value), so they are less likely to start dropping spans.

Jaeger Collector Service

The Jaeger Collector is responsible to receiving batches of spans from Jaeger Agent, run them through the processing pipeline and store them in specified storage backend. Spans are sent in jaeger.thrift format from Jaeger agent over over TChannel (TCP) protocol on port 14267.

The Jaeger collector is stateless and can be scaled to any number of instances on demand. Thus the collector can be fronted by a Kubernetes internal service (ClusterIP) that can load balance internal traffic from agents to the different collector instances.

Jaeger Query Service

The query service is the Jaeger server to back the UI. It is responsible for retrieving traces from storage and format them to display on the UI. Depending on usage of query service can have very tiny resource footprint.

Setup an ingress of internal Jaeger UI to point to backend query service.

Storage Configuration

Jaeger supports both ElasticSearch and Cassandra as storage backend. Using ElasticSearch for storage enables to have a powerful monitoring infrastructure that ties tracing and logging together. Part of the collector processing pipeline is indexing the traces for its storage backend— this will enable traces to show up in your logging UI (Kibana for example) and also bind trace ID’s to your structured logging labels. You can set the storage type to ElasticSearch via environment variable on the SPAN_STORAGE_TYPE and configure the storage endpoint via configuration.

A Kubernetes ConfigMap is used to setup the storage configuration of some of the Jaeger components. For example, the storage backend type and endpoint for Jaeger Collector and Query service.

Monitoring

As mentioned before, tracing is an important component of the monitoring infrastructure. That means, even components of your tracing infrastructure needs to be monitored as well.

Jaeger exposes metrics in prometheus format on specific ports for each component. If there are prometheus node exporters running (it should absolutely be) which are scraping for metrics on specific port — then map the metrics port of your Jaeger components to the port the node exporter is scraping metrics on.

This can be done by updating the Jaeger services (agent, collector, query) to map their metrics port (5778, 14628, or 16686) to the port that the node exporters are expecting to scrape metrics (like 8888/8080 for example).

Some important metrics to keep track of:

  • Health of each component — memory usage:
sum(rate(container_memory_usage_bytes{container_name=~”^jaeger-.+”}[1m])) by (pod_name)
  • Health of each component — CPU usage:
sum(rate(container_cpu_usage_seconds_total{container_name=~"^jaeger-.+"}[1m])) by (pod_name)
  • Batch failures by Jaeger Agent:
sum(rate(jaeger_agent_tc_reporter_jaeger_batches_failures[1m])) by (pod)
  • Spans dropped by Collector:
sum(rate(jaeger_collector_spans_dropped[1m])) by (pod)
  • Queue latency (p95) of Collector:
histogram_quantile(0.95, sum(rate(jaeger_collector_in_queue_latency_bucket[1m])) by (le, pod))

These metrics give important insights into how each component is performing and historical data should be used for optimal setup.