A Tale of Two Metrics Libraries — Part 1

Metrics, Measures, and Instruments of Measurement

In between 2008 and 2009, I designed and developed two novel Application Monitoring Performance (APM) Open APIs under the banner OpenCore. At the time both Open APIs were published on the JINSPIRED website there was yet to be anything in the way of a tracing or metric monitoring library in the Java community - a year later both Yammer and Netflix released libraries. Nearly ten years later and I’ve recently created a new Metrics API for a client.

Maybe now is a good time to outline some of the thinking in the original OpenCore Metrics library design and how it differs from the newer 2018 version, that I hope to be eventually released under an Open Source license.

There are a number of critical design concerns in the design of a client interface to a Metrics library and the underlying collection runtime:

  • Name and Labels (Part 1)
  • Measures and Instruments (Part 1)
  • Contextualization and Tagging (Part 2)
  • Configuration and Extensions (Part 2)
  • API and SPI (Part 3)
  • DSL and AOP (Part 3)
  • Integration (Part 4)

Names and Labels

Many libraries I’ve come across require some number of identifying strings during creation of instrument or registration of measure. One for the domain, another for a category, sometimes a group is offered, and finally the name.

The OpenCore Metrics library instead used a Name interface to allow specification of a namespace with as many name parts as needed. Yes, it is hierarchical because this is a natural phenomenon in composed systems.

Creating a composite Metrics.Name using the Metrics.name(String) and Name.name(String) methods.

These days there is a bit of a backlash against hierarchical naming systems for metrics. Instead, metrics are identified by an unordered bag (set) of labels — this is unfortunate as there is just no technical reason why components in a name(space) cannot be labeled and trees indexed for efficient searching. Ordering and segmentation within a namespace serve a useful purpose in expressing composition and context — it is not to be given up. I consider it a significant design flaw to force clients to choose between one or the other.

In OpenCore a Name when created from some domain object, say a Class, is automatically labeled with “class” and “java” by the underlying engine.

A method in the Metrics class for creating a composite Metrics.Name from a java.lang.reflect.Method

Name labeling is also used to distinguish between a Metrics created by an application or service and one created by a plugin extension. For example, the mark extension creates new Gauge metrics for any registered Counter. The mark extension tracks the growth in a Counter since the last mark operation executed. A kind of reset but far much better. The tag extension is similar to the mark extension, but instead of a single global tracked window, there is one per tag — a tag being a string identifier set for a (recurring) window. Both extensions will register new metrics with a configurable suffix, for mark it is “mark” and for tag is it “tag”, but these can be changed as both plugins add an extension specific Label to a new Name to indicate a derived Metric. In other Observability libraries this is handled rather crudely as shown below:

In this Metrics library a metric name is termed a Meter.Id

The Name interface allows for inspection of the associated Label instances, and since a Name can be part, prefix or suffix, of another Name, there can be multiple Label sets for each segment, or path, in such a metric identifier.

In OpenCore it is not possible to directly add a Label to a Name via the API.
A method in the Metrics.Name interface for inspection of associated Metrics.Labels

Having Name as an interface in a library allows for greater standardization, automatic context prefix injection, and Instrument and Measure lookup optimization. It eliminates much of the costly allocation, concatenation, and parsing, exhibited with the use of other libraries using Strings as names.

Measures and Instruments

The original Metrics Open API was designed primarily to help ease the integration of the many measures created by the Probes Open API metering runtime with simple and mainly sampled based monitoring technologies. Because of this, the initial design gave greater importance to the concept of Measure and Metric over Instrument, such as Counter and Gauge.

A Measure represents some value already collected and managed by the application or another instrumentation and monitoring runtime. Measures are pull-based and so are registered with the Metrics Open API — in registering a Measure with a Name, Type, and Unit, a Metric is defined.

A static method in the Metrics class offering registration of a Metrics.Measure

A Metric is a Measure that has been named and moved into the management space, which is a subset of the information space holding all value measures. A Measure, when registered as a Metric, is augmented by the metric runtime and the various extensions enabled within it with additional data collection and statistical measures including histograms and standard deviation. These runtime enhancements, in turn, result in derivative Measures and Metrics. In practice, it is hard for a library designer to justify exposing a data structure such a Histogram as something other than a set of Measures and Metrics.

The Measure defined as an inner interface within the Metrics class

With such under-the-hood derivative metric creation, it becomes readily apparent why it is essential that the namespace is partially hierarchical. You want to group the various augmentations under the (origin) source Metric.

One concern I have had with the original OpenCore Metrics API was whether both the Gauge and Counter interfaces should have extended the Measure interface. My thinking at the time was I wanted to ease the burden of creating scalable and safe Measures. A client of the metrics library should be able to introduce as many Measures as need be with as little overhead as possible and then decide whether to register them as fully-fledged Metrics. Having the Counter and Gauge interfaces extend from Measure made registration straightforward and allowed for some internal performance optimizations.

A static method in the Metrics class offering registration of a Metrics.Counter

Eventually, a runtime configuration property was added to enable the automatic registration of both Counter and Gauge as Metrics. Keeping the Measure interface on both Counter and Gauge interfaces allowed access to the underlying measurement value when needed — such as local reporting.

The coupling of Instrument and Measure interfaces is something the new library redesign corrects while offering an alternative means for such via a Callback interface and utility methods that produce a Measure from another with a specified Callback. Additionally, updates can now be contextualized.

A static method in the Metrics class to wrap a Gauge with a Callback
Note: The OpenCore Metrics API never offered an Timer interface as is the case in many subsequent somewhat derivate libraries because Timer does not have a single measurement to be a considered Measure and the timing of an executed code block is far better handled by the Probes Open API — a metering engine.

An alternative to the pull-based interface design that is Measure is the push-based interface design of an Instruments. An Instrument is not a Measure or a Metric but a device-like interface for updating one or more Measures that are themselves mapped to Metrics. An Instrument has a generic measure operation that is interpreted differently by each of the concrete Instruments. For Counter a measure is an inc. For Gauge a measure consists of an inc and dec — before and after the execution of a code block. For Timer it is the timing of the execution period — the reading some Clock before and after.

The Metrics.Instrument Interface

Instruments also offer instrument type-specific operations such as inc in Counter, inc/dec/set in Gauge, and stop/start in Timer. Nowhere are the underlying Measure(s) or Metrics accessible — developers feed data.

In a perfectly engineered software world, a developer would never need to select which Instrument to employ; unfortunately, that is not our world.

Newer metric libraries have confused things somewhat in the substitution of Instrument with Meter, while others offer a Meter as a specific extension of Metric. In one notable case, a DistributionSummary is considered a Meter. The very same Pivotal library offers up another impressively named interface, that of LongTaskTimer — sadly a recurring problem in the Observability space.