OpenTelemetry Up and Running

What you need to know about OpenTelemetry

Magsther
15 min readMar 28, 2023

Introduction

This post aims to get you the basic understanding of OpenTelemetry.

Topics that will be covered are:

  • Distributed Tracing
  • OpenTelemetry, what is it?
  • OpenTelemetry Instrumentation (Auto and Manual)
  • OpenTelemetry Protocol (OTLP)
  • OpenTelemetry Collectors
  • OpenTelemetry Collectors Deployment Patterns
  • OpenTelemetry Backends
  • OpenTelemetry on Kubernetes
  • OpenTelemetry Operator
  • OpenTelemetry Sample application

By the end of this post, you will know how to implement tracing in your application without any code changes by using the OpenTelemetry Operator.

Distributed Tracing

Let’s start with taking a look at what Distributed Tracing is and why we need it.

Why do we need Tracing?

Why do we need distributing tracing, why can’t we only use metrics and logs? Well, imagine that you have a micro-services architecture like this one below.

source

Now imaging a request from the client.

As you can you see from the architecture above, a request can go through tens or hundreds of network hops. This makes it very difficult to know the entire path the requests takes, and also very complicated to troubleshoot if you only have logs and metrics.

When, something fails, there are many issues that you must address.

  • How can we found out the root cause?
  • How can we keep an eye on all the services it went through?

Distributed Tracing helps you to see the interaction between the services during the whole request and provides you with the insights into the full lifecycle of requests in your system. It helps us to spot errors, bottlenecks and performance issue in our application.

The tracing starts the moment an end-user interacts with an application and we should be able to see the whole request until the last tier.

The (trace) data (in the form of spans) generates information (metadata) which can help to understand how and why latency or errors are occurring and what impact they’re having on the entire request.

source

If you want to know more about Distributed Tracing and what problem that it solves, please read the A beginner’s guide to Distributed Tracing to know how to monitor a micro-services architecture.

How can we implement tracing?

To implement tracing, we need to:

  1. Instrument our application
  2. Collect and process the data
  3. Store and visualise the data so that we can query it.

To do this, we can use two open source projects. [OpenTelemetry and Jaeger]

source

Let’s start with OpenTelemetry

OpenTelemetry

OpenTelemetry can be used to collecting data from the applications.

It is a collection of tools, APIs and SDKs that we can use to to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyse your software’s performance and behaviour.

source

OpenTelemetry is:

  • Open Source
  • Adopted and supported by industry leaders in the observability space
  • A CNCF project
  • Vendor agnostic

OpenTelemetry includes the three pillars of observability: tracing, metrics and logs. Note: This post will focus on the tracing pillar.

  • Distributed tracing is a method of tracking the path of a service request from beginning to end across a distributed system.
  • Metrics are the measurement of activities over a period of time, to gain visibility into system or application performance.
  • Logs are text records of events that occur at specific points in time in a system or application.

OpenTelemetry is vendor agnostic

OpenTelemetry provides a vendor agnostic standard for observability as it aims to standardise the generation of traces. With OpenTelemetry we can decouple the instrumentation from the backends. This is great, because that means that we are not tied to any tool (or vendor).

Not only can we use any programming language we want, but we can also pick and choose the storage backend, thus avoiding a potential buy in from commercial vendors.

Developers can instrument their application without having to know where the data will be stored.

OpenTelemetry gives us the tools to create trace data, and in order to get this data, we first need to instrument the application to collect the data. For that, we use the OpenTelemetry SDK.

Instrumentation

The data from the application can be generated using either automatic or manual (or a mix) instrumentation.

To instrument your application with OpenTelemetry, go to the OpenTelemetry repository, and pick the language for your application and follow the instructions.

Auto Instrumentation

Using the auto-instrumentation can be a good first step as this is simple, easy, and doesn’t require many code changes.

This approach is especially good if don’t have the necessary knowledge (or time) to create a tracing framework tailored for your application.

When you use auto instrumentation, a predefined sets off spans will be created for you and populated with relevant attributes.

Manual Instrumentation

Manual instrumentation is when you write specific code for your application.

It’s the process of adding observability code to your application. This can more effectively suit your needs, as you can add attributes and events. The drawback to this is that you need to import the libraries and do all the work yourself.

In a previous post “A beginner’s guide to OpenTelemetry”, we presented a high level overview of how it may look like.

Propagators

Propagators like W3C tracecontext, baggage and b3 can be added to the configuration.

A Propagator type defines the restrictions imposed by a specific transport and is bound to a data type, in order to propagate in-band context data across process boundaries.

See the blog post from LightstepHQ : The Big Pieces: OpenTelemetry context propagation to read more about this in detail.

Sampling

Sampling is a mechanism to control the noise and overhead introduced by OpenTelemetry by reducing the number of samples of traces collected and sent to the backend.

You can tell OpenTelemetry to perform sampling depending on the amount of traces/traffic you want to send. (ie. take only 10% of the traces).

source

Two common sampling techniques are Head sampling and Tail sampling

See the blog from Team Aspecto : OpenTelemetry Sampling: Everything You Need to Know to read more about this in detail.

OpenTelemetry Protocol

OpenTelemetry Protocol (OTLP) specification describes the encoding, transport, and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends. source

Each language SDK provides an OTLP exporter you can configure to export data over OTLP. The OpenTelemetry SDK then transforms events into OTLP data.

OTLP is the communication between the agents (configured as exporter) and the collector (configured as receiver).

OpenTelemetry Collector

The data from your instrumented application can be sent to an OpenTelemetry collector.

The collector is a component of OpenTelemetry that receives data (spans, metrics, logs etc) process (pre-processes data), and exports the data (sends it off to a backend that you want to talk to).

source

Receivers

A receiver, which can be push or pull based, is how data gets into the collector. The OpenTelemetry collector can receive telemetry data in multiple formats.

Here is an example of a receiver (on the collector) that accepts OTLP data on port 4317 (gRPC)

otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"

And here is an example where it accepts OTLP data on both grpc and http.

      otlp:
protocols:
grpc:
http:

Processing

Once the data is received , the collector can process the data. Processors are run on data between being received and being exported. Processors are optional though some are recommended

In this example, the batch processor is enabled.

The batch processor accepts spans, metrics, or logs and places them into batches. Batching helps better compress the data and reduce the number of outgoing connections required to transmit the data. This processor supports both size and time based batching.

processors:
batch:

Configuring a processor does not enable it. Processors are enabled via pipelines within the service section.

service:
traces:
receivers: [opencensus, jaeger]
processors: [batch]
exporters: [opencensus, zipkin]

A full list of processors can be found here

Exporters

In order to visualise and analyse the telemetry you will need to use an exporter. An exporter is a component of OpenTelemetry and is how data gets sent to different systems/back-ends.

A common exporter to start with and that is very useful for development and debugging tasks is the console exporter.

In the exporters section, you can add more destinations. For example, if you would also like to send trace data to Grafana Tempo, just add these lines to the central_collector.yaml file.

exporters:
logging:
otlp:
endpoint: "<tempo_endpoint>"
headers:
authorization: Basic <api_token>

Also make sure to add this exporter to the pipeline:

pipelines:
traces:
receivers: [otlp]
processors: []
exporters: [logging, otlp]

OpenTelemetry comes with a variety of exporters, and in the OpenTelemetry Collector Contrib repository you can find even more.

Extensions

Extensions are available primarily for tasks that do not involve processing telemetry data. Examples of extensions include health monitoring, service discovery, and data forwarding. Extensions are optional.

extensions:
health_check:
pprof:
zpages:
memory_ballast:
size_mib: 512

OpenTelemetry Collector — Deployment Patterns / Strategy

The OpenTelemetry collector can be deployed in different ways and it is important to think of how you want to deploy it.

The strategy you pick is often depending on the teams and organisations.

Let’s look at some of the patterns.

OpenTelemetry Agent to a Backend

In this scenario, the OpenTelemetry instrumented application sends the data to a (collector) agent that resides together with the application. This agent will then offload responsibility and handle all the trace data from the instrumented application.

The collector can be deployed as an agent via a sidecar which can be configured to send data directly to the storage backend.

Using a Central (Gateway) OpenTelemetry Collector

You can also decide to send the data to another OpenTelemetry collector and from the (central) collector send the data further to the storage backend. In this configuration, we have a central OpenTelemetry collector that is deployed using the deployment mode, which comes with many advantages like auto scaling.

Some of the advantages of using a central collector are:

  • Removes dependancy on the teams
  • Enforce configuration / polices for batching, retry, encryption, compression
  • Authentication in a central place. William Tavares wrote a post how to do this using Azure ADAL.
  • Enriching with metadata.
  • Making sampling decisions (ie. take only 10% of the traces).
  • Scaling for example via Horizontal Pod Autoscaler

All deployment patterns put together

You can find many more deployment patterns by watching the presentation from Juraci Paixão Kröhling . The accompanying repository can be found here.

Here are the deployment patterns explained from the presentation:

Basic — Clients are instrumented using OTLP, sending data to a cluster of collectors.

Basic — Fanout — Sending data to multiple exporters.

Normalizer

This pattern features a scenario where the instrumentation might have been done with one library per signal and where we might need to normalize them to a common set, name and/or format.

Kubernetes — Patterns that can be used when deploying the OpenTelemetry Collector on Kubernetes.

Kubernetes pattern, with sidecar.

Agent as sidecar, where a container is added to the workload pod with the OpenTelemetry Collector. This instance is then configured to send data to an external collector, potentially in a different namespace or cluster.

Kubernetes pattern, with daemon sets.

Agent as DaemonSet, where a DaemonSet is used instead, so that we have one agent pod per Kubernetes node.

Load Balancing — Load balancing based on trace IDs.

Multi-Cluster — Agent, workload, and control plane collectors.

Multitenant — Two tenants, each with their own Jaeger.

Per Signal — Two collectors, one for each telemetry data type.

OpenTelemetry Backends

The OpenTelemetry collector does not provides their own backend, so it is up for any vendor or open source product to grab!

Even though OpenTelemetry does not provides their own backend, by using it, we are not tied to any tool or vendor, since it is vendor agnostic. Not only can we use any programming language we want, but we can also pick and choose the storage backend and also easily switch to another backend/vendor, by just configure another exporter.

In order to visualise and analyse the telemetry, we can add one more more exporters to the configuration.

Remember, an exporter is a component of OpenTelemetry and is how data gets sent to different systems/back-ends.

A common exporter to start with and that is very useful for development and debugging tasks is the console exporter.

A very popular open source product to use for analysis and querying the data is Jaeger.

Jaeger is open source distributed tracing system for tracing transactions between distributed services. It’s used for monitoring and troubleshooting complex microservices environments.

It comes with many components and one of them is the Jaeger UI.

You can find much more about Jaeger in our Getting Started with Jaeger posts.

Troubleshooting the OpenTelemetry Collector

If you are running into any issues with your deployment, the OpenTelemetry documentation provides the following troubleshooting tips for the collector.

  • Are there error messages in the logs of the collector?
  • How is the telemetry being ingested into this component?
  • How is the telemetry being modified (i.e. sampling, redacting) by this component?
  • How is the telemetry being exported from this component?
  • What format is the telemetry in?
  • How is the next hop configured?
  • Are there any network policies that prevent data from getting in or out?

I usually start looking into the logs of the application, and of the containers. First I use the kubectl describe command. This command provides detailed information about each of the pods.

kubectl describe pod <pod_name>

The I run the kubectl logs <pod_name> to view the logs of the collector. That way I can see if the applications successfully transmitted data to the collector. If you don’t see the logs directly, then that’s because the collector will try to buffer data before sending it.

OpenTelemetry on Kubernetes

In a previous post OpenTelemetry on Kubernetes, we deployed an OpenTelemetry collector on Kubernetes and used an OpenTelemetry instrumented (Go) application to send traces to the Collector.

From there, we brought the the trace data to a Jaeger collector and visualised with the Jaeger UI.

In that example, we deployed the OpenTelemetry collector using this otel-collector.yaml file ,which consists of a ConfigMap, Service and a Deployment.

We can also deploy OpenTelemetry using an operator, in which we will do next.

OpenTelemetry Operator

The OpenTelemetry Operator is an implementation of a Kubernetes Operator. The operator manages the OpenTelemetry Collector and auto-instrumentation of the workloads using OpenTelemetry.

One of the benefits with using Operators it that it extends the functionally of Kubernetes. The Cloud Native Computing Foundation (CNCF) wrote a good blog post about this if you want to read more about Kubernetes Operators.

Deploying the OpenTelemetry Operator can be done by applying the operator manifest directly like this:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

You can also install the Opentelemetry Operator via Helm Chart from the opentelemetry-helm-charts repository. More information is available in here.

Once the operator is deployed, it will provide two custom resources:

Make sure to read the OpenTelemetry Operator post, where we setup a workflow like this:

Sample Application

To put this all together, we will build a Java application (Petclinic ), which is a Spring Boot application built using Maven or Gradle. This application will produce data using OpenTelemetry.

In an earlier post OpenTelemetry and Spring Boot, we instrumented the same application using OpenTelemetry by downloading the opentelemetry-javaagent packaged provided from OpenTelemetry.

We could then run the application using:

java -javaagent:opentelemetry-javaagent.jar -jar target/*.jar

Automatic instrumentation with Java uses a Java agent JAR that can be attached to any Java 8+ application. It dynamically injects bytecode to capture telemetry from many popular libraries and frameworks. It can be used to capture telemetry data at the “edges” of an app or service, such as inbound requests, outbound HTTP calls, database calls, and so on

By running the above command, it allowed us to instrument an application and generate traces with very little changes to our application.

Using the OpenTelemetry Operator, we can implement tracing in our application without any code changes. How cool is that :)

The deployment file for this application look like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: petclinic
labels:
app: petclinic
spec:
replicas: 1
selector:
matchLabels:
app: petclinic
template:
metadata:
labels:
app: petclinic
spec:
containers:
- name: petclinic
image: <path_to_petclinic_image>
ports:
- containerPort: 8080

Sidecar

Now we will deploy an OpenTelemetry agent using the Sidecar mode. This agent will send traces from the application to our central(gateway) OpenTelemetry collector.

Create a new file and call it sidecar.yaml

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar
spec:
mode: sidecar
config: |
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
exporters:
logging:
otlp:
endpoint: "<path_to_central_collector>:4317"
service:
telemetry:
logs:
level: "debug"
pipelines:
traces:
receivers: [otlp]
processors: []
exporters: [logging, otlp]

Auto Instrumentation

The operator can inject and configure OpenTelemetry auto-instrumentation libraries.

Currently DotNet, Java, NodeJS and Python are supported.

To use auto-instrumentation, configure an Instrumentation resource with the configuration for the SDK and instrumentation.

For the Java application, that will look like this.

vim instrumentation.yaml

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: java-instrumentation
spec:
propagators:
- tracecontext
- baggage
- b3
sampler:
type: always_on
java:

Enabling the instrumentation

To enable the instrumentation, we need to update the deployment file and add annotations to it. This way we tell the OpenTelemetry Operator to inject the sidecar and the java-instrumentation to our application.

Add these lines to the applications deployment file.

annotations:
instrumentation.opentelemetry.io/inject-java: 'true'
sidecar.opentelemetry.io/inject: 'sidecar'

petclinic-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: petclinic
labels:
app: petclinic
spec:
replicas: 1
selector:
matchLabels:
app: petclinic
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-java: 'true'
sidecar.opentelemetry.io/inject: 'sidecar'
labels:
app: petclinic
spec:
containers:
- name: petclinic
image: <path_to_petclinic_image>
ports:
- containerPort: 8080

Now we can deploy all this with:

kubectl apply -f instrumentation.yml

kubectl apply -f sidecar.yaml # the sidecar (agent)

kubectl apply -f petclinic-java.yaml # (our application)

We also need to create a service for our deployment, which we can deploy with kubectl apply -f petclinic-svc.yaml

apiVersion: v1
kind: Service
metadata:
name: petclinic-service
spec:
selector:
app: petclinic
ports:
- protocol: TCP
port: 80
targetPort: 8080

Once deployed, we can use Port Forwarding to access the application.

kubectl port-forward svc/petclinic-svc 8080:80

Now, we can access the application to generate some traces.

Open a browser and go to: http://localhost:8080/

Analyse and Query the data

Now to the fun part. How can we view all our traces in a UI?

As mentioned in the Exporters section above, we had to configure exporters in order to visualise and analyse the telemetry.

Remember, OpenTelemetry does not come with storage and querying. That’s why we had to add exports so that OpenTelemetry knows how data gets sent to different systems/back-ends.

A common exporter to start with and that is very useful for development and debugging tasks is the console exporter. While great for debugging, it is not very useful to analyse the trace data.

Instead, we can send the traces to one or more destinations, that provides the complete flow of the traces using visualisations.

In the “OpenTelemetry Operator — Tracing made easy” post, we used Grafana Tempo to see the whole trace from the first service until the last one, but you can easily switch that to a vendor of choice to add more destinations.

This is the view using the Jaeger UI.

All of this could be accomplished without writing any line of code in your application.

Conclusion

In this post we aimed to get you the basic understanding of OpenTelemetry.

OpenTelemetry includes the three pillars of observability (tracing, metrics and logs), and in this post we focused on the tracing pillar. In future posts, logs and metrics will be covered as well.

I hope you liked this post. If you found this useful, please hit that clap button and follow me to get more articles on your feed.

Awesome OpenTelemetry

Checkout Awesome-OpenTelemetry to quickly get started with OpenTelemetry. This repo contains a big list of helpful resources.

An awesome list is a list of awesome things curated by the community. You can read more in the Awesome Manifesto

--

--