How VOI went DARK

Georgy Korev
Voi Engineering
Published in
3 min readMay 22, 2020

Introduction

Like many other tech companies, VOI uses Grafana to monitor infrastructure, services, and applications. All our backend workloads run on Kubernetes. In this blog post, we are going to show how we connect these two technologies to scale monitoring operations with DARK (which stands for Dashboards As Resources in Kubernetes) — a tool we use to define Grafana dashboards as Kubernetes resources, right next to the services they monitor.

Problem

As VOI grew, our platform became more and more complex. The number of services to monitor increased, more environments were introduced and more engineers were involved. With this rapid growth came some challenges. Setting up monitoring for a new service required manual configuration in Grafana and that work being replicated for all the environments in which the service is deployed.

At this point in time, we felt the need to take a step back and start thinking about how we could improve the way we configured monitoring for our services.

The main issues we were trying to address:

  • A lot of tedious and error-prone work was needed to setup monitoring for services;
  • Dashboards and graphs definition was inconsistent across squads and services;
  • All of the above were done manually.

Solution

What we wanted to achieve:

Monitoring dashboards as code or configuration;

As engineers, we love to write code and don’t like repetitive tedious mouse clicks. With that said, instead of going to Grafana website and creating dashboards and setting up metrics, we wish we could declaratively describe dashboards in code and deploy them with a service.

Dashboards living “next to the code”;

Having dashboards reside in the repo of the service they monitor improves modularity and makes dashboard ownership clearer.

Dashboards versioning;

We know that changes are inevitable. Therefore, changes should be safe to make with the possibility to fall back to the previous version sometime after. Having Git as a canonical source of truth for the state of your dashboard makes rollback to the previous state of the dashboard as simple as running ‘git revert’;

Reviewable dashboards configuration;

Like peer-reviewing of the code changes, reviewing dashboards before setting up monitoring and alerting should help a publisher to decide whenever the outcome is correct.

Generation of default monitoring dashboards for a new service;

Although all services fulfill different purposes, there are some commonalities as well. For example, they expose endpoints for users of the service and might themselves call endpoints of downstream services. Wouldn’t it be convenient to generate dashboards for those common cases automatically?

Automated deployments and rollbacks.

Once your dashboards are reviewed and approved, they can be automatically rolled out by CI/CD pipeline.

After evaluating existing OSS tools we concluded that none of them satisfies our needs. So, DARK was born.

With DARK, dashboards are defined as Kubernetes custom resources and deployed via Kubernetes custom controller to Grafana.

Using a human-readable DSL and Kubernetes manifests, DARK provides a way to deploy your dashboards automatically, as if it was code. Combining DARK’s YAML format with tools such as Helm or Kustomize, we can generate environment-specific dashboards that can be deployed by our CD pipeline. For example, dashboard can be defined like:

And then deployed via kubectl:

kubectl apply -f k8s/example-dashboard.yml

If you want to verify that your dashboard was created, you can also do it via kubectl:

kubectl get dashboardskubectl get events | grep dark

But that’s not all! Dashboards in Grafana are represented as JSON objects that describe metadata of dashboards. In order to convert existing Grafana dashboards into DARK ones, there’s a tool provided:

dark-converter dashboard.json converted-dashboard.yaml

Where dashboard.json is a JSON object that represents your Grafana dashboard.

Conclusion

Since DARK was introduced we see its successful adoption on our Technology Platform among service owners and get really positive feedback from engineers.

Contribution

DARK is an Open-Source Project created by VOI engineer Kévin Gomez. By contributing to OSS we want to be a good member of the Open Source community as we truly believe that by the end of the day we all benefit from it.

--

--