Monitoring on Kubernetes: Metric Collection Agents

dmitrio
Infragravity
Published in
4 min readJan 19, 2018
Screenshot from WeaveScope application: metric collection agent.

TLDR: This article describes approach used for operational insight from multiple sources on Kubernetes. Each monitoring system, supports ingestion for different type of data, but not all can support both metrics and signals for events. For example, Prometheus is great for metrics only and you have to choose other options for tracing or logs. The scenario with flexibility to choose which metric to collect and where to send it is shown on this video. The case study from using this approach with Sonar monitoring agent can be found here.

System reliability engineers deal with multiple sources for operational insights. Two most common are metrics and tracing. While metrics represent breadth for operation insights, tracing provides in-depth view when the problem does occur. Some tracing data is often associated with sampling or logs to get more information while metrics indicate high error rates.

The metric collection is integral part of monitoring, responsible to gathering data series and sending them to a given monitoring system. The monitoring agents (or exporters) are used for this purpose.

For example, Prometheus monitoring, which is de-facto standard for operational insights on Kubernetes, has active developer community and therefore open source exporters for metric sources. Other vendors who offer time series databases for monitoring scenario also offer exporters.

However, metric collection is still in it’s infancy considering the below data points:

  • Difficult to use — for example, some WMI exporters written in Go and support only fixed set of metrics out of the box. Thus, if you cannot find metric you looking for, you’ll have to generate code in Go and end up with custom version of that exporter.
  • No collection insights— gathering any type of metric comes at cost. Most exporters I tried did not measure a cost of scraping a given metric. In other words, all you know is that sample has been scraped when telemetry is coming in. Consider looking at Windows process metrics using Win32_Process or performance counter: one takes 20ms or less and another ten times more.
  • Exporter often supports only single metric system — this is typical for majority of exporters. Very few exporters can support configurable outputs for given input.
  • Monitoring Windows platform in cloud-native way is limited — Microsoft offers Application Insights agent ( or exporter) for collecting metrics and sending them to Azure for analysis. This offering differs from cloud-native approach: customers on AWS or elsewhere are unlikely to send metrics externally. Instead they have options like DataDog or other commercial offerings for sending metrics to another provider.
  • Enterprise developers still use .NET — some refer to Gartner claims that 70% of enterprises are using Windows. Many .NET developers in enterprise prefer to use C# when possible as programming language.

In addition, there is no ideal monitoring system: every choice has pros and cons. For example, I found that sending metrics to Akumuli TSDB is simplest and more resource efficient for collection that require good time resolution. Using Prometheus for scraping every N seconds is not feasible for all scenarios.

Cloud native monitoring requires common way for specifying and gathering metrics. This was the reason for writing new metric collection agent ( project “Sonar”, ships under free BSD) in .NET Core:

  • Metrics anywhere, integration and data privacy — support scrapers and time series databases. (Sonar exposes metrics to Prometheus for scraping, Akumuli and InfluxDb at the same time. Logs can be sent to InfluxDb time series database.)
  • Symmetry — Kubernetes can run anywhere, so does Sonar as sidecar for Linux or Windows. As second option, Sonar can be deployed as daemon inside Windows container and run on any orchestrator.
  • Portability — Unix, Windows, Nano Server support .NET Core which makes Sonar fit well by definition.
  • Measurable — The process of collecting metrics must collect data to avoid expensive polling when possible. (Sonar collects scrape time for each metric, so SRE can optimize queries like WMI).
  • Unification — by defining metrics in the same way ( per input type) and configurable outputs. For example, sidecar can gather same metric but you can change destination (monitoring system or time series database) dynamically at runtime. (See the video).
  • Supportable — integration systems already use concept of extensibility with adapters, while the core runtime remains the same. This approach increases supportability and maintainability of metric agent. (See example for adding MySQL adapter to Sonar sidecar container is on GitHub).
  • Windows metrics — support for local or remote WMI queries, gathering any performance counters on containers without any coding is not available other exporters.
  • Easy to extend for .NET developers — collecting metrics is not a “magic”. With many Nuget packages available in .NET ecosystem, creating simple input adapter for gathering metrics from MySQL takes an hour or less.
  • Easy deployment — nice to have common way to deploy metric collection agent images on Kubernetes using Helm chart.

Whether you decide to deploy Sonar as sidecar for Windows or Unix or daemon on Windows container, host or virtual machine it will be running same runtime. This means you can deploy container to any orchestrator as long as you have compatible monitoring systems to receive the data.

With commercial market for operation insights and monitoring has been very successful, enterprise customers are looking for multi-cloud and data privacy. I certainly hope that Kubernetes community, which advocates cloud-native approach will work to increase enterprise customer awareness of choices available to them. While there are many large companies like Microsoft contribute to Kubernetes, the cloud-native story is for every platform: Windows or Linux.

I am sure there are many choices for metric collection out there: let the best monitoring agents win:-)

--

--

dmitrio
Infragravity

Former Microsoft architect #integration,#architecture,#patterns