OpenTelemetry metrics workshop: Jan 15 2021

Sergey Kanzhelev
OpenTelemetry
Published in
4 min readJan 21, 2021

Last week Friday, OpenTelemety held a productive and informative metrics workshop. This workshop sparked a lot of interest and hosted over 100 participants from more than 25 companies and open source projects.

Planned as a gathering between the OpenMetrics designers and the OpenTelemetry metrics group, the workshop began with a statement, organized by Alolita Sharma, on OpenTelemetry’s commitment to Prometheus users and the OpenMetrics ecosystem. Complete, first-class Prometheus compatibility is a goal of the OpenTelemetry project in every sense possible, while OpenTelemetry also aims to add value by combining metrics signals with traces and (in the future) logs. This commitment is aligned with the OpenTelemetry mission to make telemetry a built-in feature of all cloud-native software and will benefit both the OpenTelemetry and Prometheus communities.

The workshop went into more specific detail surrounding deployment models and metrics infrastructure scalability, organized by Jaana Dogan. The group has identified several key challenges in deploying a mix of pull- and push-based metrics collection for further development.

The workshop also covered the OpenTelemetry Data Model, as reflected in the current OpenTelemetry Metrics specification, organized by Bogdan Drutu. The OpenTelemetry protocol (OTLP) is positioned as an interchange format for a number of today’s metrics protocols, with support for out-of-process aggregation, while OpenTelemetry’s shared semantic conventions improve our ability to convey meaning through metrics, as well as trace and log data.

Later, the workshop jumped into a specific discussion of OpenTelemetry metrics histograms, organized by Josh MacDonald and Michael Gerstenhaber, covering designs for optional high-resolution histograms support in OpenTelemetry libraries and agents.

The workshop ended with a wide-ranging and collaborative Questions and Answers session, organized by Austin Parker, with members of the OpenTelemetry project at large.

During the workshop, several workstreams were identified and prioritized. All of these components will be needed to declare that OpenTelemetry fully supports metrics and the Prometheus community. For the most part these efforts will be carried out in parallel.

Data model and collector workstream

By working on the OpenTelemetry Collector’s Prometheus feature set, we believe the OpenTelemetry collector can discover and scrape OpenMetrics/Prometheus targets as an alternative to the Prometheus server and other OpenMetrics agents.

The OpenTelemetry Protocol (OTLP) metrics data model and Collector support passing the received OpenMetrics/Prometheus data through processors and to exporters for the backend of choice, including OTLP and Prometheus Remote Write. We hope to ensure that the OpenTelemetry Metrics data model is fully compatible with backends expecting Prometheus-formatted data.

This work is crucial as it sets a foundation for the OpenTelemetry metrics project’s stated Phase 1 goal of protocol-level compatibility for common open-source existing metrics ecosystems. Specific scenarios the Collector needs to support — like being a drop-in replacement for Prometheus server — and what scalability goals must be attained, should be defined as part of this workstream.

APIs expressiveness and usability workstream

OpenTelemetry provides a set of libraries for different programming languages. These libraries are necessary for instrumenting apps and libraries to expose telemetry in a unified and universally understood way. The OpenTelemetry Metrics API (draft) specifies a low-level interface that is necessarily and distinctly different — by comparison with a wide range of metrics libraries — in order to be capable of expressing specific semantics over a wide range of operational characteristics (e.g., delta-oriented, push-based metrics delivery).

A more expressive API can support a wider range of scenarios, including edge cases, but can come at the cost of usability. Thus, striking a balance between flexibility and usability is a topic of this second workstream. And the main factor for deciding on this balance will be a set of real world scenarios we will be working on.

This workstream will include revising and deciding on naming for various API concepts.

SDK and Collector flexibility workstream

Because OpenTelemetry incorporates correlated signals from the environment, including Resource attributes and Distributed baggage attributes, OpenTelemetry is necessarily interested in a more flexible stance toward the use of ad-hoc metric labels. The OpenTelemetry collection infrastructure will support the correct addition and removal of metric labels, both inside and outside of the process.

The ability to configure the set of metrics and labels for these metrics in runtime, instead of at development time, and to apply that configuration both inside the SDK or inside the Collector, is crucial for a great telemetry data. This is because the author of the instrumentation, who knows which labels are available, is typically not the same as the system operator, who knows which labels, resource and distributed context attributes are interesting.

Code instrumentation is rarely made perfect from the first attempt, nor can it correctly predict all usage patterns from the outset. There are many ideas on features to implement in metrics SDK. This workstream is to define the scope of the work and make sure the implementation is efficient and scalable.

Summary

A lot of work is required for OpenTelemetry to get to full metrics support. We are excited with all the interest the workshop sparked. We will keep posting on our specific plans for the metrics as they become clearer. Keep a lookout for related issues which will detail the metrics tasks. Please also join in for the weekly SIG meetings for the language libraries you’re interested in.

Thank you Josh MacDonald — the primary organizer of this workshop. Also we thank all section organizers and coordinators, and all participants. New contributors and participants in the project are always welcome, and if you’re interested in getting involved in OpenTelemetry, here are a few options:

  1. Contribute to specifications.
  2. Participate in Metrics Special Interest group.
  3. Join the Prometheus working group
  4. Implement metrics spec into your favorite language!

Meeting notes: https://docs.google.com/document/d/169jbn_yBLS-nalny3pwKJ7YeEZ8tQeu_ch3dUlDj7LM/edit#heading=h.c49iio2ba7gf

Recording: https://youtu.be/gtCOn1d6DVI

This post was co-authored by Sergey Kanzhelev and Josh MacDonald, and reviewed by Alolita Sharma, Jaana Dogan, Josh Suereth, Morgan McLean, and Sarah Novotny. As usual thank you Amelia Mango for proofread and styling.

--

--