Debugging microservices on Kubernetes with Istio, OpenTelemetry and Tempo — Part 2

Sander Rodenhuis
Otomi Platform
5 min readAug 11, 2023

--

In part 1 I explained how to:

  • Install Grafana Tempo as a backend to store telemetry data
  • Install and configure the OpenTelemetry Operator
  • Configure Istio for tracing using the opentelemetry extension provider
  • Configure Nginx Ingress Controller for OpenTelemetry

In this second part, I’ll explain how to:

  • Configure datasources for Tempo and Loki in Grafana
  • Set up automated instrumentation for context propagation

Grafana datasources

Now Istio and the Nginx Ingress Controllers are configured to send telemetry data to the OpenTelemetry Collector, and the Collector is configured to export data to Tempo, the next thing we want is to see traces.

Because I’m using Otomi, Loki is already installed and configured in a multi-tenant setup allowing Teams to see the logs of their applications. The default Envoy access logging is configured to show the Trace ID in the logs and Teams can see these logs in Loki. Now I want a user to be able to open the complete trace (queried fron Tempo) based on a Trace ID in the Loki Logs. For this I will configure 2 datasources. One for Tempo and One for Tempo.

Let’s start with the datasource for Tempo. Use the following values to configure an additional Grafana datasource for Tempo:

grafana:
additionalDataSources:
- name: Tempo
type: tempo
uid: tempo
access: proxy
url: http://tempo-query-frontend.tempo:3100
jsonData:
nodeGraph:
enabled: true

And then the additional datasource for Loki:

grafana:
additionalDataSources:
- name: Loki
editable: false
type: loki
uid: loki
access: proxy
url: http://loki-query-frontend-headless.monitoring:3101
jsonData:
derivedFields:
- datasourceName: Tempo
matcherRegex: "traceID=00-([^\\-]+)-"
name: traceID
url: "$${__value.raw}"
datasourceUid: tempo

Because the I’m using the opentelemetry extension provider in Istio, the Envoy access logs will show the TRACEPARENT header. This header looks this this:

00-3724a0faa7a5c20783702c91a9082ae2-4526b089c53276b7-01

The Trace ID however is between the first and the second (dash). This is why I’m using the ”traceID=00-([^\\-]+)-” matcherRegex. This will now add the option to link directly to the trace in Tempo:

When you now click on the link to Tempo, Tempo will be queried based on the Trace ID:

And when you click on the Node graph, you’ll see the complete flow:

Now you might notice that Tempo is showing the full trace from the Nginx Controller to the Istio Ingress Gateway, to the Team’s Ingress and then multiple spans within the petclinic app. In the first part I explained that Istio is not able to this and you will need to instrument the application.

In part 1 I also showed how to install the Tempo Metrics Generator. This is a requirement for using Service Graphs in Grafana. Service graphs can help you to understand the structure of a distributed system. I’m not going deeper into all the possibilities of relating logs, metrics and traces. You can read about all the datasource configuration options here. Let’s go and instrument an application.

Setup automated Instrumentation

Automatic Instrumentation is a feature of the OpenTelemetry Operator to provide a way to instrument your application without touching your source code.

Before we’re going to deploy the petclinic application, we’ll first need to create an OpenTelemetryCollector resource:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar
spec:
mode: sidecar
config: |
receivers:
otlp:
protocols:
grpc:
http:
exporters:
otlp:
endpoint: otel-collector-collector.otel.svc.cluster.local:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers:
- otlp
exporters:
- otlp

This collector will run as a sidecar in the petclinic Pod and send all telemetry data to the central Collector (we created in Part 1).

Next we’ll need to create an Instrumentation resource:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: java-instrumentation
spec:
propagators:
- tracecontext
- baggage
sampler:
type: always_on
java:
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://otel-collector-collector.otel.svc.cluster.local:4317

Note that a Propagator is used to propagate the trace context.

Now deploy the petclinic image:

apiVersion: apps/v1
kind: Deployment
metadata:
name: petclinic
labels:
app: petclinic
spec:
replicas: 1
selector:
matchLabels:
app: petclinic
template:
metadata:
labels:
app: petclinic
annotations:
instrumentation.opentelemetry.io/inject-java: 'true'
sidecar.opentelemetry.io/inject: 'sidecar'
spec:
containers:
- name: petclinic
image: springcommunity/spring-framework-petclinic
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: petclinic
spec:
selector:
app: petclinic
ports:
- port: 80
targetPort: 8080

As you can see, I only created a ClusterIP service. Because I use Otomi, I only need to configure the service for exposure.

Now the PetClinic app is deployed and the Instrumentation is configured, we can now see the full trace and see what went wrong:

Using Jaeger as a backend

When using the OpenTelemetry Collector, you can just as easy use a diffent backend. If you would like to use Jaeger, just add Jaeger as an exporter to the collector config:

exporters:
logging:
loglevel: info
otlp:
endpoint: tempo-distributor.tempo.svc.cluster.local:4317
sending_queue:
enabled: true
num_consumers: 100
queue_size: 10000
retry_on_failure:
enabled: true
tls:
insecure: true
jaeger:
endpoint: jaeger-operator-jaeger-collector.jaeger.svc:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers:
- otlp
processors:
- memory_limiter
- batch
exporters:
- logging
- otlp
- jaeger

Then install Jaeger and you would see the same traces:

Recap

In the first part of this mini series we looked at how to setup tracing on Kubernetes using Istio, OpenTelemetry and Grafana Tempo. I did not really go deep into the theory behind tracing, but only focussed on the implementation. But be aware of the fact that I did not cover all the topics like multi-tenancy, performance tuning, customizing Envoy logging. Maybe that’s for another time.

The setup I explained here will be the basis for tracing in Otomi. When using Otomi, you’ll be able to turn on Tempo and OpenTelemetry and then Istio and all the Grafana datasources will be automatically configured.

Tempo and OpenTelemetry in Otomi

Be the first to try it out and follow Otomi on GitHub.

--

--