Debugging microservices on Kubernetes with Istio, OpenTelemetry and Tempo — Part 1

Published in

Otomi Platform

7 min readAug 9, 2023

I recently worked on a side project to improve tracing in Otomi by implementing Grafana Tempo and OpenTelemetry. I’m gonna share my experiences and configuration in two posts (because there is so much involved here). This is the first one.

Don’t expect a fussy story about tracing in general. I’m going to explain the full the full setup and share my experiences.

Why this project

Otomi (a self-hosted PaaS for Kubernetes) uses Istio in its core and includes an advanced multi-tenant observability stack with logging (Loki), metrics (Prometheus), alerting (alert manager) and tracing (Jaeger). But we got some questions from users about the tracing setup. Questions like: “Why do I only see partial data with partial context (single spans)” and “Where are traces stored and for how long?

What you need to know about Tracing with Istio

First of all, configuring Istio for tracing is easy to set up. Istio is responsible for managing traffic, so it can also report traces that allow visibility to Istio and the application behaviour. But because there is no code running within the application itself to collect data, Istio can only collect partial data with partial context.

Ehh, what does that mean? Well, when service A calls service B, Istio creates a span that represents the event. However, when service B calls service C, Istio cannot recognize that this is part of the same continual trace originating from service A. To solve this, you’ll need to instrument each service to extract the context propagation from Istio and inject it into the downstream service(s). Instrumenting can be done (manually) by using the OpenTelemetry SDK or automatically by using the OpenTelemetry Operator.

Where are traces stored?

Jaeger natively supports two open source NoSQL databases as trace storage backends, Cassandra and Elasticsearch. In Otomi we only use object storage. There are some open source projects you can use to connect object storage (like AWS S3) with Jaeger, but these projects are not actively maintained. That’s why we didn’t configure Jaeger with a storage backend. This led me to look at using Grafana Tempo as a backend. So to answer the question, in a K8s volume. That’s not ideal, especially not when you have long retention requirements.

Extending the Tracing setup in Otomi with OpenTelemetry and Tempo

I realised that the tracing setup in Otomi was quite limited, so I started a little side project to integrate Tempo as a tracing backend and provide teams (tenants) on the platform to query Tempo to see all elements involved in the request and see all the interrelationships between their various services using a node graph in Grafana.

Otomi uses Istio service mesh in its core. Istio leverages Envoy’s distributed tracing feature to provide tracing integration out of the box. Although Istio proxies can automatically send spans, additional information is needed to join those spans into a single trace. So we need context propagation.

This led to the following solution architecture:

Install Grafana Tempo
Install OpenTracing Operator
Configure OpenTelemetry Collector
Configure Istio to use the opentelemetry tracing provider and send spans to the OpenTracing Collector
Configure Grafana datasource for Tempo
Configure the Grafana datasource for Loki to provide a direct link from a traceID in the logs to the trace in Tempo
Configure Instrumentation for context propagation

Install Grafana Tempo

I’m going to use Tempo as the backend for the traces. Tempo can be configured to use Object Storage services like AWS S3, Azure Blob or (in my case) a local (S3 compatible) Minio Instance running in the cluster.

Before we’re installing the Tempo Distributed Helm chart, let’s first look at some important values. I always install charts with my own values ;-)

metricsGenerator:
  enabled: true
  config:
    storage:
      path: /var/tempo/wal
      wal:
      remote_write_flush_deadline: 1m
      remote_write:
       - url: http://po-prometheus.monitoring:9090/api/v1/write
storage:
  trace:
    backend: s3
    s3:
      bucket: tempo
      endpoint: minio.minio.svc.cluster.local:9000
      access_key: my-access-key                          
      secret_key: my-secret-key                            
      insecure: true

traces:
  otlp:
    http:
      enabled: true
    grpc:
      enabled: true

metaMonitoring:
  serviceMonitor:
    enabled: true
    labels:
      prometheus: system

Install the chart:

helm repo add grafana https://grafana.github.io/helm-charts
helm install -f my-values.yaml tempo grafana/tempo-distributed -n tempo

As you can see, I’m installing the Metrics Generator. This will enable us to see trace related metrics in Grafana Dashboards. More on this later. Please also note that we did not look at resource configuration and scaling options. This is still a PoC right!

If your using Prometheus, make sure to enable the remote write receiver like this:

prometheus:
  prometheusSpec:
    enableRemoteWriteReceiver: true

You should now see the following pods running:

# kubectl get po -n tempo                                                                              
NAME                                       READY   STATUS    RESTARTS        AGE
tempo-compactor-d59b598b5-8287b            1/1     Running   4 (6h19m ago)   16h
tempo-distributor-7b5b649487-fbzf2         1/1     Running   4 (6h19m ago)   16h
tempo-ingester-0                           1/1     Running   4 (6h19m ago)   16h
tempo-ingester-1                           1/1     Running   4 (6h19m ago)   16h
tempo-ingester-2                           1/1     Running   4 (6h19m ago)   16h
tempo-memcached-0                          1/1     Running   0               16h
tempo-metrics-generator-66c5dfc565-5dhsv   1/1     Running   4 (6h19m ago)   16h
tempo-querier-694cbf6d7-gxjzj              1/1     Running   4 (6h20m ago)   16h
tempo-query-frontend-67b4ff47c6-9msmv      1/1     Running   4 (6h19m ago)   16h

Note that my Minio instance has been setup independently of Tempo. If you don’t already have Minio running (or don’t like to use S3 or an Azure storage container, then you can install Minio using the Tempo Helm chart.

Now we have Tempo up and running, let’s install the OpenTelemetry Operator. Why am I using the Operator? Well, I’m not a fan of the OpenTelemetry Collector Helm chart because it never creates the collector configuration I want. If you use the OpenTelemetry Operator, you can create your own custom Collector configuration and more control over it. Another benefit of using the Operator, it supports automated Instrumentation!

Install OpenTelemetry

The configuration is quite straightforward, so let’s just install it:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm install opentelemetry-operator open-telemetry/opentelemetry-operator -n otel

Now comes the interesting part: configuring the Collector. Create a OpenTelemetryCollector resource. You can use the following as an example:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s
    exporters:
      logging:
        loglevel: info
      otlp:
        endpoint: tempo-distributor.tempo.svc.cluster.local:4317
        sending_queue:
          enabled: true
          num_consumers: 100
          queue_size: 10000
        retry_on_failure:
          enabled: true
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers:
          - otlp
          processors:
          - memory_limiter
          - batch
          exporters:
          - logging
          - otlp
  mode: deployment

When the OpenTelemetryCollector is created, you will see the following pods running:

kubectl get po -n otel 
NAME                                        READY   STATUS    RESTARTS   AGE
otel-collector-collector-776cdc65f8-pgmvs   1/1     Running   0          162m
otel-operator-78fc8b6975-h7lkh              2/2     Running   0          16h

Now we have a backend (Tempo) and a Collector (OpenTelemetry) running, the next step is to send some spans. Let’s start with Istio. Istio controls all traffic and when debugging applications this will become a very relevant aspect.

Configure Istio for tracing

Well, that’s easier said than done. There are many ways to configure Istio (Envoy) for tracing, documentation is fragmented and complete tutorials are hard to find. You can choose to configure tracing in the defaultConfig or use extensionProviders. And there are multiple extensionProviders. So the question is: “What configuration to use and when?”. I don’t have all the answers.

I decided to go for the OpenTelemetryTracingProvider and use the default envoy provider to add the TRACEPARENT header to the logs. The idea here is to use the default Envoy provider to add the trace-id to the logs of the istio-proxy sidecar. This is would be quite handy because you can configure the Loki datasource to create a link from the traceID directly to Tempo. More on that later.

I’m using Otomi and Otomi uses the Istio Operator (version 1.17.4). To configure tracing in Istio, we’ll first need to modify the Istio operator resource, using the following meshConfig.

meshConfig:
  accessLogFile: /dev/stdout
  accessLogFormat: |
    [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %CONNECTION_TERMINATION_DETAILS% "%UPSTREAM_TRANSPORT_FAILURE_REASON%" %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%" %UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME% %ROUTE_NAME% traceID=%REQ(TRACEPARENT)%
  enableAutoMtls: true
  extensionProviders:
  - opentelemetry:
      port: 4317
      service: otel-collector-collector.otel.svc.cluster.local
    name: otel-tracing

To enable the extensionProvider, you’ll need to create a Telemetry resource:

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: otel-tracing
  namespace: istio-system
spec:
  tracing:
  - providers:
    - name: otel-tracing
    randomSamplingPercentage: 100

By creating this resource in the istio-system namespace, the provider will be active for all namespaces.

I set the randomSamplingPercentage to 100%. In a production environment this will probably be 0.1%.

After the prometheus operator has reconciled, you should see spans coming in that are then exported to Tempo.

kubectl logs otel-collector-collector-776cdc65f8-pgmvs -n otel
2023-08-09T13:07:58.186Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "logging", "resource spans": 5, "spans": 7}
2023-08-09T13:08:08.186Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "logging", "resource spans": 2, "spans": 6}

In Otomi we also use Nginx Ingress Controller. To eventually see the complete trace from the ingress controller, Istio Gateways, Ingresses and eventually the application, we’ll also configure the Nginx Controller to send spans to the collector. Configure Nginx Ingress using the following values:

controller:
  opentelemetry:
    enabled: true
  config:
    enable-opentelemetry: true
    otel-sampler: AlwaysOn
    otel-sampler-ratio: 0.1
    otlp-collector-host: otel-collector-collector.otel.svc
    otlp-collector-port: 4317
    opentelemetry-config: "/etc/nginx/opentelemetry.toml"
    opentelemetry-operation-name: "HTTP $request_method $service_name $uri"
    opentelemetry-trust-incoming-span: "true"
    otel-max-queuesize: "2048"
    otel-schedule-delay-millis: "5000"
    otel-max-export-batch-size: "512"
    otel-service-name: "nginx"
    otel-sampler-parent-based: "true"

See here to learn more about tracing in Nginx Ingress using OpenTelemetry.

Wrap up (for now)

We now have a backend for our traces, a Collector to receive trace spans and export them to the backend (Tempo), and Istio and Nginx Ingress Controller sending trace spans to the Collector.

In the second part, we’re going to instrument our application and configure datasources in Grafana for Tempo to see the real power of tracing in Kubernetes.