A Tale of Two Metrics Libraries — Part 1
Metrics, Measures, and Instruments of Measurement
In between 2008 and 2009, I designed and developed two novel Application Monitoring Performance (APM) Open APIs under the banner OpenCore. At the time both Open APIs were published on the JINSPIRED website there was yet to be anything in the way of a tracing or metric monitoring library in the Java community - a year later both Yammer and Netflix released libraries. Nearly ten years later and I’ve recently created a new Metrics API for a client.
Maybe now is a good time to outline some of the thinking in the original OpenCore Metrics library design and how it differs from the newer 2018 version, that I hope to be eventually released under an Open Source license.
There are a number of critical design concerns in the design of a client interface to a Metrics library and the underlying collection runtime:
- Name and Labels (Part 1)
- Measures and Instruments (Part 1)
- Contextualization and Tagging (Part 2)
- Configuration and Extensions (Part 2)
- API and SPI (Part 3)
- DSL and AOP (Part 3)
- Integration (Part 4)
Names and Labels
Many libraries I’ve come across require some number of identifying strings during creation of instrument or registration of measure. One for the
domain, another for a
category, sometimes a
group is offered, and finally the
The OpenCore Metrics library instead used a
Name interface to allow specification of a namespace with as many name parts as needed. Yes, it is hierarchical because this is a natural phenomenon in composed systems.
These days there is a bit of a backlash against hierarchical naming systems for metrics. Instead, metrics are identified by an unordered bag (set) of labels — this is unfortunate as there is just no technical reason why components in a name(space) cannot be labeled and trees indexed for efficient searching. Ordering and segmentation within a namespace serve a useful purpose in expressing composition and context — it is not to be given up. I consider it a significant design flaw to force clients to choose between one or the other.
In OpenCore a
Name when created from some domain object, say a
Class, is automatically labeled with
“java” by the underlying engine.
Name labeling is also used to distinguish between a
Metrics created by an application or service and one created by a plugin extension. For example, the
mark extension creates new
Gauge metrics for any registered
mark extension tracks the growth in a
Counter since the last
mark operation executed. A kind of reset but far much better. The
tag extension is similar to the
mark extension, but instead of a single global tracked window, there is one per
tag — a
tag being a string identifier set for a (recurring) window. Both extensions will register new metrics with a configurable suffix, for
mark it is
“mark” and for
tag is it
“tag”, but these can be changed as both plugins add an extension specific
Label to a new
Name to indicate a derived
Metric. In other Observability libraries this is handled rather crudely as shown below:
Name interface allows for inspection of the associated
Label instances, and since a
Name can be part, prefix or suffix, of another
Name, there can be multiple
Label sets for each segment, or path, in such a metric identifier.
In OpenCore it is not possible to directly add a
Namevia the API.
Name as an interface in a library allows for greater standardization, automatic context prefix injection, and
Measure lookup optimization. It eliminates much of the costly allocation, concatenation, and parsing, exhibited with the use of other libraries using
Strings as names.
Measures and Instruments
The original Metrics Open API was designed primarily to help ease the integration of the many measures created by the Probes Open API metering runtime with simple and mainly sampled based monitoring technologies. Because of this, the initial design gave greater importance to the concept of
Instrument, such as
Measure represents some value already collected and managed by the application or another instrumentation and monitoring runtime.
Measures are pull-based and so are registered with the Metrics Open API — in registering a
Measure with a
Metric is defined.
Metric is a
Measure that has been named and moved into the management space, which is a subset of the information space holding all value measures. A
Measure, when registered as a
Metric, is augmented by the metric runtime and the various extensions enabled within it with additional data collection and statistical measures including histograms and standard deviation. These runtime enhancements, in turn, result in derivative
Metrics. In practice, it is hard for a library designer to justify exposing a data structure such a
Histogram as something other than a set of
With such under-the-hood derivative metric creation, it becomes readily apparent why it is essential that the namespace is partially hierarchical. You want to group the various augmentations under the (origin) source
One concern I have had with the original OpenCore Metrics API was whether both the
Counter interfaces should have extended the
Measure interface. My thinking at the time was I wanted to ease the burden of creating scalable and safe
Measures. A client of the metrics library should be able to introduce as many
Measures as need be with as little overhead as possible and then decide whether to register them as fully-fledged
Metrics. Having the
Gauge interfaces extend from
Measure made registration straightforward and allowed for some internal performance optimizations.
Eventually, a runtime configuration property was added to enable the automatic registration of both
Metrics. Keeping the
Measure interface on both
Gauge interfaces allowed access to the underlying measurement value when needed — such as local reporting.
The coupling of
Measure interfaces is something the new library redesign corrects while offering an alternative means for such via a
Callback interface and utility methods that produce a
Measure from another with a specified
Callback. Additionally, updates can now be contextualized.
Note: The OpenCore Metrics API never offered an
Timerinterface as is the case in many subsequent somewhat derivate libraries because
Timerdoes not have a single measurement to be a considered
Measureand the timing of an executed code block is far better handled by the Probes Open API — a metering engine.
An alternative to the pull-based interface design that is
Measure is the push-based interface design of an
Instrument is not a
Measure or a
Metric but a device-like interface for updating one or more
Measures that are themselves mapped to
Instrument has a generic
measure operation that is interpreted differently by each of the concrete
measure is an
measure consists of an
dec — before and after the execution of a code block. For
Timer it is the timing of the execution period — the reading some
Clock before and after.
Instruments also offer instrument type-specific operations such as
Timer. Nowhere are the underlying
Metrics accessible — developers feed data.
In a perfectly engineered software world, a developer would never need to select which
Instrument to employ; unfortunately, that is not our world.
From “Observability is the new Monitoring” to “Operational is the new Mechanical” and now today “How we Operate code…”medium.com
Newer metric libraries have confused things somewhat in the substitution of
Meter, while others offer a
Meter as a specific extension of
Metric. In one notable case, a
DistributionSummary is considered a
Meter. The very same Pivotal library offers up another impressively named interface, that of
LongTaskTimer — sadly a recurring problem in the Observability space.