Observability & monitoring — Part 02

Fathima Dilhasha
The DevOps Journey
Published in
3 min readDec 14, 2018

This post is a continuation of the post Observability & monitoring — Part 01.

As promised, I will be discussing about metrics monitoring in this post.
Metrics monitoring is one of the pillars of Observability.

Pillars of Observability

When it comes to monitoring metrics, we can categorize them based on the following approaches.

  1. Service level vs. Instance level
  2. Blackbox vs. Whitebox

Service level vs. Instance level

Service level metrics monitoring is focused on monitoring service level objectives(SLO) for the system. There are four signals (Latency, Traffic, Errors, Saturation) that are identified as the golden signals of service level monitoring. The SLOs can be established based on these golden signals. Some service level metrics are login results and port availability.

While service level metrics monitoring keeps track of SLOs, instance level metrics monitoring is needed for diagnosing the root causes. Some sample instance level metrics are load average, disk usage and JVM heap usage.

Black box vs. White box

Black box monitoring treats a system as a black box and refers to monitoring the system from outside. This type of monitoring indicates the availability of a system and is symptom oriented. So, leveraging operating system level metrics and network level communication metrics can be considered as black box techniques.

White box monitoring allows detection of future problems and depends on the ability to inspect the internals. In a multi layered system(e.g. WSO2 Public Cloud), a symptom from one layer can be the reason for an issue in another layer. For example, in the database system monitoring, slow database reads is a symptom. But for the application layer, the latency in database access can lead to a latency in invocation.
So, the white box metrics should be determined in a way that the cause for an issue is identifiable across the involved layers. It is advisable to define thresholds for the metrics such that an anomaly is distinguishable.

Life cycle of a metric

During metrics monitoring, we can define five main stages of a metric.
Any toolkit that is being used for monitoring should cover these stages in order to provide meaningful insights on the metrics.

Life cycle of a metric
  • Metrics exposure : A mechanism to expose metrics to an external monitoring tool from the system that is being monitored
  • Metrics collection : A mechanism for collecting the exported metrics
  • Metrics storage : A mechanism to store the collected metrics to gain insights on the trends of the metrics
  • Metrics visualization : A mechanism to visualize, track and identify trends in the metrics over time
  • Alerting: A mechanism to notify the system administrators on any anomalies in the metrics

I will be discussing about a metrics monitoring tool kit that involves tools covering above five stages in my next post. Stay tuned!

Update:

Next post at https://medium.com/the-devops-journey/observability-monitoring-part-03-35a4601c0380

--

--