Metrics to Monitor Microservices with OpenTelemetry and Prometheus

Ebubekir Dinc
6 min readDec 31, 2023

--

This article is part of my Microservices and Cloud-Native Applications series. You can find the other parts of the series below.

  1. Saga Orchestration using MassTransit in .NET
  2. API Gateway with Ocelot
  3. Authorization and Authentications with IdentityServer
  4. Eventual Consistency with Integration Events using RabbitMq
  5. Distributed Logging with ElasticSearch, Kibana, and SeriLog
  6. Resiliency and Fault Tolerance with Polly
  7. Health Check with WatchDogs in a Microservices Architecture
  8. Distributed Tracing with Jaeger and OpenTelemetry in a Microservices Architecture
  9. Metrics to Monitor Microservices with OpenTelemetry and Prometheus

If you want to take a look at the GitHub code, you can access it here: https://github.com/ebubekirdinc/SuuCat

Metrics are essential to monitoring, controlling, and optimizing the system’s scalability, performance, and reliability in a microservices architecture. Metrics make it easier to keep an eye on how well the system and each individual microservice are performing. This covers resource usage, throughput, and response times. You can find bottlenecks and improve the efficiency of particular services by looking at these indicators. Metrics offer information about the availability and condition of every microservice. Tracking metrics like error rates and service uptime guarantees that the system is responsive and available generally and aids in the early detection of problems.

We can also access data such as how many calls an endpoint has received, how many messages have been left in a queue, the last state of the stock and its change over time, the execution times of an endpoint, or methods that run for more than 500 ms. These are all data that we might want to track in a distributed system.

In a microservices design, metrics work in tandem with logging and tracing. While logs provide detailed information about specific events, metrics offer aggregated and summarized data that can be used for trend analysis and high-level monitoring.

In our project, SuuCat, Metrics has been implemented using OpenTelemetry together with Prometheus. OpenTelemetry Metrics facilitates consistency and interoperability in the observability arena by offering an extensible and standardized method for instrumenting, gathering, and exporting metric data from applications. And Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability in modern, dynamic microservices architectures. They both have Nuget packages in .NET and are easy to implement.

Prometheus can be installed using the following Docker files. More information about the installation is here: https://github.com/ebubekirdinc/SuuCat/wiki/GettingStarted

docker-compose.yml

https://github.com/ebubekirdinc/SuuCat/blob/master/docker-compose.yml
https://github.com/ebubekirdinc/SuuCat/blob/master/docker-compose.yml

docker-compose.override.yml

https://github.com/ebubekirdinc/SuuCat/blob/master/docker-compose.override.yml
https://github.com/ebubekirdinc/SuuCat/blob/master/docker-compose.override.yml

prometheus.yml

https://github.com/ebubekirdinc/SuuCat/blob/master/prometheus.yml
https://github.com/ebubekirdinc/SuuCat/blob/master/prometheus.yml

In the prometheus.yml file above, we define the set of instructions that Prometheus follows to collect metrics data from certain microservices. The scrape_configs section contains a list of jobs, each with a unique job_name. A job represents a collection of similar instances of a service that Prometheus scrapes. The scrape_interval parameter defines how often Prometheus should scrape metrics from the targets. Here, the scrape_interval is set to 2s, meaning Prometheus will scrape metrics from these targets every 2 seconds.

For Metrics, we will use the same common project we use for Tracing. We need to add the “OpenTelemetry.Exporter.Prometheus.AspNetCore” package to that project.

https://github.com/ebubekirdinc/SuuCat/blob/master/src/BuildingBlocks/Tracing/OpenTelemetryExtensions.cs
https://github.com/ebubekirdinc/SuuCat/blob/master/src/BuildingBlocks/Tracing/OpenTelemetryExtensions.cs

As you can see in the image above The extension method AddOpenTelemetryMetrics() is used to configure OpenTelemetry metrics for our application. It adds OpenTelemetry to the services collection and configures it with metrics options. It adds a Prometheus exporter, which is used to export the metrics data to Prometheus. The resource is set with the service name and version from the OpenTelemetry parameters which are set in appsettings.json files in each microservice.

When we run the project, we should see that the microservices to which we have added metrics are green as shown below.

Prometheus Targets screen
Prometheus Targets screen

Now let’s come to where we define the counters of the metrics. For defining and managing metrics in a microservices architecture, the OpenTelemetryMetric class is created. This class contains several static Meter objects, each representing a different microservice in the system: IdentityMeter, OrderMeter, and StockMeter. These Meter objects are used to create different types of metrics.

Now we will see how these are defined and with example screenshots from Prometheus.

For instance, IdentityMeter is the meter of the Identity microservice. And the UserCreatedEventCounter is a Counter<int> metric that tracks the number of user-created events in the Identity microservice. It is implemented in the AuthController like this:

  OpenTelemetryMetric.UserCreatedEventCounter.Add(1, new KeyValuePair<string, object>("event.name", "UserCreatedEvent"));

To see this in Prometheus, go to the Prometheus home page, enter “user” in the search box, and “user_created_event_count_total” will appear among the options. Select it and press Execute and you will see a screen similar to the one below.

Prometheus Up Counter
Prometheus Up Counter

Of course, you will need to make some requests to the SignUp endpoint in Swagger to generate enough data.

Identity microservice Swagger
Identity microservice Swagger

In this type of counter (CreateCounter<int>()) the data increases continuously, if we need both an increasing and decreasing counter, we can use CreateUpDownCounter<int>() as in StockMeter. CreateUpDownCounter<int>().

After defining CreateUpDownCounter in OpenTelemetryMetric, we will add a line like the following to AddStockCommand in the Subscription microservice to increase the stock.

We will also add a line to the OrderCreatedEventConsumer event to reduce the stock each time it is consumed.

https://github.com/ebubekirdinc/SuuCat/blob/master/src/Services/Subscription/src/Infrastructure/Consumers/Events/OrderCreatedEventConsumer.cs
https://github.com/ebubekirdinc/SuuCat/blob/master/src/Services/Subscription/src/Infrastructure/Consumers/Events/OrderCreatedEventConsumer.cs

To see the data generated, type “stock” in the search box in Prometheus and select “subscription_stock_count” from the list that appears, then click Execute. You will see a screen similar to the one below. Again don't forget to generate data before. Here you will see that the data is not only increasing but also decreasing.

Prometheus UpDown Counter
Prometheus UpDown Counter

Now let’s look at a third type, the Histogram. In OpenTelemetry, a histogram is a metric used to measure the distribution of values over a period of time. A histogram counter, specifically, is a type of counter metric that is designed to capture statistical distribution information about a set of values. Unlike a simple counter that increments by a fixed amount, a histogram counter captures a range of values and their frequencies. It provides insights into how values are distributed across a given range.

In our case, it’s used to measure the duration of a method. We are expecting to store durations measured in milliseconds. As you can see here, we collect the histogram data with Record(). You can also see below the normal counter(OrderLongRunningRequestCounter) added for long-running methods.

To see the data generated, type “order” in the search box in Prometheus and select “order_method_duration_milliseconds_bucket” from the list that appears, then click Execute. As a result, you can see the data divided into buckets like below.

Prometheus Histogram screen
Prometheus Histogram screen

To see individual metrics in each bucket you can click the corresponding bucket below the chart.

Prometheus Histogram buckets
Prometheus Histogram buckets

In this article, we have seen how to implement metrics in a microservices architecture using OpenTelemetry and Prometheus. If you are looking for a visually better place, you can use Grafana. But Prometheus will be enough to get you started.

Related to this topic, you can also look at Distributed Tracing.

More info can be found in the Prometheus docs, and SuuCat GitHub.

--

--