Tackling Observability Head-on!

We at mStakx believe that the technology can change the world for the better. Therefore we help implement the latest and greatest technology into companies of our partners, whether it is DevOps processes, serverless and cloud infrastructure or Observability.

For past few years a lot of emphasis had been given to having DevOps teams to take care of Infrastructure as Code (IaC). DevOps teams were seen as point of contact to automate every aspect of the infrastructure.

However with advent of Observability, it is now evident that the DevOps team need not be solely responsible for IaC. The more we explored Observability, we found that the three pillars of Observability: Logging, Metrics and Tracing were closer to the development teams as compared to the DevOps teams.

With these eye-openers in view, we have been exploring how the development teams can setup their Observability environments as easily as possible jump on the efficient side of the engineering world :)

In a series of stories we will discuss how the Observability frameworks can be configured to give a holistic view of the application that is being developed in order to provide the engineering and product teams with insightful analytics.

This story is broken in 3 sections:
A) What is Observability? (Introduction to Observability)
B) What tools do I have at my disposal? (Introduction to a few frameworks and tools used in Observability)
C) One boilerplate to rule them all! (Rundown through a boilerplate which anyone can use as a starting point)

What is Observability

Observability is an engineering philosophy wherein you observe the data flowing through the whole system via a set of tools and practices, and turn the collected data points and contexts into useful insights.

It is rather difficult of define Observability into a single concept. Some definitions consider Observability in terms of system failure, while some talk about Observability with reference to the testing pyramid

“Observability is a system attribute” — Baron Schwartz 
Corollary — “There are different techniques to achieve observability in a system”

The concept of Observability is built on collecting every possible snapshot of the application. These snapshots can then be used to develop intelligent analytics upon the data collected, which can further be used to provide alerts and probably self-healing triggers into the system. Final outcome of Observability stack are visualizations of these snapshots, analytics and alerts which could be useful to the engineering team.

Observability — An Overview

Three pillars of Observability

  1. Logging: Logging comprises of recording discrete events in the system. These events can be structured (JSON based application/system logs) or unstructured (text strings)
  2. Metrics: Metrics are aggregatable events like counters (Eg: HTTP requests), gauges (HTTP queue depth), histogram etc. which can help identify trends
  3. Tracing: Recording events with casual ordering across services and distributed systems as well; hence, enabling them to identify cause across borders
Credit: Peter Bourgon

What tools do I have at my disposal?

There are very mature tools and frameworks available in order to realise Observability via Instrumentation ( Logging, Metrics and Tracing collection), Stack (Data storage) and Visualisation (Analysis)

The Observability tools and frameworks are maturing with each passing day. A few of those include the following:

  1. Elastic Stack
    Formerly known as “ELK” Stack, “ELK” stood for three open source projects: Elasticsearch, Logstash, and Kibana.
Credit — Elastic Stack

Elasticsearch is a search and analytics engine. In context of Observability, Elasticsearch serves as a centralised data storage component to which varied data can be ingested, and from which data can be exported to any UI tooling framework for analytics purposes. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch. In context of Observability, Logstash serves as a Logging instrumentation component.

mStakx is a proud Elastic partner and provides consulting services around DevOps and Observability use-cases using Elastic stack.

Credit — Elastic Stack!

Kibana lets users visualize data with charts and graphs in Elasticsearch. In context of Observability, Kibana serves as a Dashboarding/Analytics component. In 2015, Elastic introduced a family of lightweight, single-purpose data shippers into the ELK Stack equation called Beats. The community Elastic framework continues to grow stronger as the need for Observability finds firm roots in current engineering scenario.

2. Prometheus: (Metrics Collection and Dashboarding)

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project’s governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.

3. Zipkin: (Tracing Collection and Dashboarding)

Zipkin

Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data.

4. Grafana: (Analytics and Dashboarding)

Grafana

Grafana is an open source, feature rich metrics dashboard and graph editor that allows you to query, visualize, alert on and understand your metrics no matter where they are stored. It gives engineering teams ability to create, explore, and share dashboards and foster a data driven culture.

One boilerplate to rule them all!

With a fair understanding of Observability, and introduction to the tools/frameworks to achieve observable systems; we thought it would be a good idea to create a boilerplate to demonstrate the whole system in action. That way we would end up creating one boilerplate to rule them all!

This boilerplate is built as a set of Docker images, so that everything is built on a single click via docker-compose. The key components of this boilerplate are the docker images of the following :

  1. Application Layer: A simple web application built in Python Django with Postgres as database, and nginx running the webserver
  2. Instrumentation Layer: A set of Logstash, Prometheus and Zipkin libs for Python serving as conduits to collect data (logs, metrics and traces)
  3. Stack (Data Storage) Layer: Elasticsearch, Zipkin and Prometheus storage
  4. Visualization (Analysis) Layer: Kibana, Prometheus UI, Grafana, and Zipkin UI
Observability Boilerplate #1

The boilerplate is built to give the following dashboards out-of-the-box:

  1. Application and system logs on Kibana dashboard
Kibana dashboard showing application log collected via Logstash

2. Application traces on Zipkin and Kibana dashboards

Zipkin dashboard showing a simple 2-span trace of request moving from one application to other
Kibana dashboard showing the traces collected by Zipkin

3. Application metrics on Prometheus and Grafana dashboards

Prometheus dashboard graphing object creation metrics
Grafana dashboards graphing object creation/updation and database execution metrics

You can get started with this boilerplate with a single click, play around with it, and use it in your development/production environments to turn your applications into Observable systems. The boilerplate code is available on Github. Pull-requests are welcome to make it better and more useful for a varied production environments.