Adding OpenTelemetry Metrics in Your Go App

Matheus Nogueira
Kubeshop
8 min readNov 1, 2023

--

When releasing a product, things can go wrong. Even if you prepare beforehand. When deploying something to production, it’s probably the first time that a version of your application will be running for days without being restarted. This scenario will unfortunately bring up issues that you will not see until then, such as memory leaks, dependencies failing randomly, and networking issues. The list goes on and on.

Without having metrics to cover those cases, it is almost impossible to detect and fix these issues before they happen. That’s why having metrics is important, you want to know beforehand if something is going to fail or at least get notified if something is broken.

In this guide, I’m going to describe how our team added metrics to Tracetest and how you can do the same with your application.

Types of Metrics

Before starting, it’s important to understand the three types of metrics that you can have:

  1. Counter
  2. Gauge
  3. Histogram

Counter

A counter is a metric where you can only add positive values. It’s useful for keeping track of the number of times that something happened. Some examples are:

  • Number of events from a queue.
  • Number of errors in the REST API.
  • Number of bytes transferred.

Gauge

A gauge is useful for storing non-additive values over time. It can be used to keep track of resources used or how many concurrent operations are being executed. Some examples are:

  • Amount of RAM used by the application.
  • Number of active database connections.
  • Number of active HTTP connections.

Histogram

A histogram is very useful for sampling values and analyzing them. Think of a histogram as a Gauge, but capable of organizing the values into buckets while also keeping track of the number of values and their sum. It’s usually used with values that can be statistically meaningful. Some examples are:

  • Duration of an HTTP endpoint execution.
  • Size of HTTP request payloads.

I’m going to focus on histograms, but this doesn’t mean the other types of metrics are less important.

Configuring OpenTelemetry Metrics

The very first step is to install the OpenTelemetry Go libraries in your project:

go get go.opentelemetry.io/otel \
go.opentelemetry.io/otel/metric \
go.opentelemetry.io/otel/sdk \
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc

Next, you will need to create a metric.Meter which is the interface responsible for creating your metric meters. Those are the actual objects needed to record the metrics from your application:

// telemetry/metrics.go
package telemetry

Now that you can create your own metric.Meter, you can start adding metrics to your system and receive them in your OpenTelemetry Collector. Let’s look at two examples of metrics to illustrate how you can use it.

Allocated Memory Metric

One of the first infrastructure metrics you should consider is the allocated memory because it can help you identify memory leaks.

// main.go
package main

In this example, the application will report a process.allocated_memory every 5 seconds. This will collect the number of bytes that are allocated by the application, convert it to Megabytes and register it as part of the metric.

HTTP Endpoint Metrics

Another simple metric that is very useful is to have a histogram of the latency of all your HTTP requests. In this case, you can use an http.Middleware to apply this metric to all your endpoints at the same time without the need to modify them one by one. In my case, I use mux to handle HTTP endpoints, so this example will also rely on mux:

// http/middleware/metrics.go
package http

Now, add the middleware to your router:

// main.go
package main

Sending Metrics to an Analytics Tool

For Tracetest, we chose Grafana Cloud to store our metrics. But this doesn’t affect our code in any way because we are forwarding all our metrics to the OpenTelemetry Collector. We just need to configure the Collector to send those metrics to Grafana Cloud’s Prometheus endpoint.

receivers:
otlp:
protocols:
grpc:
http:

With the OpenTelemetry Collector configured, you will be able to see all your metrics in Grafana Cloud and can start building dashboards and alerts based on them.

Using OpenTelemetry Metrics in Your Daily Work

You have an application that is currently reporting metrics, but what is the benefit? What can be achieved by instrumenting your applications? Let’s delve into this subject and explain some of the benefits you can derive from using your metrics.

Grafana Cloud

After your application starts sending metrics to Grafana, you can start building dashboards with visualizations like this:

It shows the average latency of each endpoint in Tracetest in the last 2 days. It’s very useful for detecting if our application’s performance is degrading before it can actually impact our users.

To create a dashboard with this kind of visualization, you can follow Grafana’s guide and use the following query:

avg by(http_method, http_route) (http_server_latency_milliseconds_bucket{server_environment="production",http_route!=""})

Other than just visualizing your metrics, you can also benefit from using them actively to alert your team about anomalies. Alarms let you set a threshold of how long API requests should take and in case of an incident and your API latency surpasses the threshold for more than 5 minutes, you will be notified.

This allows you to detect problems in new deployments of your application, resource starvation, or dependencies degrading before these issues affect your users. The more meaningful metrics you have, the better your dashboards and alerts will be.

Why you should use OpenTelemetry Metrics

Metrics are crucial for building reliable products. When facing technical problems, most users of a product will not report the issue and just leave. Having a way of identifying those problems before they happen or at least getting notified when the problems start happening is crucial for quickly fixing your product before losing users because of preventable errors.

What’s next?

Would you like to learn more about Tracetest and what it brings to the table? Check the docs and try it out today by downloading it today!

Also, please feel free to join our Discord community, give Tracetest a star on GitHub, or schedule a time to chat 1:1.

--

--