Hardening Prometheus and OpenEBS using Litmus

Chaos Engineering for Prometheus on Kubernetes using Litmus

Modern day testing of applications happens both in the CI pipelines and also in production, that is if you are following the principles of chaos engineering. Chaos engineering has gotten into the lives of DevOps so much so that it has become part of the application development process. We at MayaData use Litmus to practice chaos engineering for validating each commit of OpenEBS against a few Prometheus releases. We also extend the same testing to do some real-world chaos engineering on our production clusters where Prometheus is being used for monitoring our GitLab production system.

In this article, I will describe how OpenEBS is used as persistent storage for Prometheus and discuss how we verify the stability of such a deployment. Before going into the details, let us see what is Litmus, why OpenEBS is used as TSDB for Prometheus.

What is Litmus?

LITMUS — An open source framework for chaos engineering based qualification of Kubernetes environments running stateful applications. For a good introduction to Litmus and how to get started with Litmus, see Litmus docs (https://docs.litmuschaos.io/ )

Litmus books are broadly categorized into four types.

  1. K8S infrastructure books
  2. Stateful applications deployment books
  3. Stateful applications chaos books
  4. Deployers for providers such as OpenEBS.

Below, I will specifically focus on what Litmus deployers and chaos jobs are available to build a CI/CD pipeline to harden Prometheus application on OpenEBS and Kubernetes.

Introduction to OpenEBS

OpenEBS is the leading open source Container Attached Storage software and has become a frequent part of many Kubernetes deployments since its first release in early 2017. OpenEBS has been accepted into the Cloud Native Computing Foundation as a Cloud Native Sandbox Project and is already featured here: CNCF Sandbox Projects. You can read much more about OpenEBS on OpenEBS Docs

OpenEBS architcture

Because OpenEBS is a pluggable, containerized architecture it can easily use different storage engines that write data to disk or underlying cloud volumes; the two primary storage engines are Jiva and cStor. With WAL support, the write performance of Prometheus increases significantly.

Why use OpenEBS volume as Prometheus TSDB?

One of the challenges with Prometheus is how to set up and manage the storage for it. The default behavior of Prometheus is to simply have each node store data locally however this, of course, exposes the user to the loss of data stored locally when the local node goes down.

Here are some issues with Prometheus storage:

1. When using local storage, Prometheus stores time series in memory and on local disk, hence metrics are not persisted if its POD restarts.

2. In case we configure persistent volume as local. If that pod is rescheduled to any other Node of the cluster then, it loses all previous data which has persisted on the previous node.

3. When using Remote storage, read and write operations are quite slow.

By using OpenEBS volumes as to the local storage for Prometheus on Kubernetes clusters, each of the above drawbacks is addressed. OpenEBS volumes are replicated synchronously and data is protected and is always made available against either a node outage or a disk outage.

Using OpenEBS as storage for Prometheus on Kubernetes clusters is an easy and viable solution for production-grade deployments.

OpenEBS as highly available TSDB for Prometheus

Elements of a Prometheus CI/CD pipeline

We have implemented GitLab stages for Prometheus and system validation. Full blown implementation of such pipeline is shown below as an example.

GitLab CI pipeline for Prometheus on OpenShift using OpenEBS as persistent storage

Above is a sample GitLab pipeline that is running OpenShift EE 3.10 and Prometheus:v2.3.0 with Litmus. Here are the following stages:

  • CLUSTER-Setup
  • OpenEBS-Setup
  • FUNCTIONAL
  • CHAOS
  • CLEANUP

Litmus provides almost-ready books for every stage except FUNCTIONAL, which is where the Developers and DevOps admins should be spending time in creating the tests for their applications. The rest of the stages are generic enough that Litmus can do the job for you with the tuning of the parameters.

Reference implementation:

The Prometheus GitLab pipeline implementation for OpenShift EE platform, corresponding Litmus books are all available at the following location in the OpenEBS GitHub repository.


Example litmus jobs for Prometheus on OpenEBS

App deployers

Litmus job for deploying Prometheus using OpenEBS volumes for storing metrics.

https://raw.githubusercontent.com/litmuschaos/litmus/master/apps/prometheus/deployers/run_litmus_test.yml

Loadgen

Litmus job for load generation in Prometheus using Avalanche load generator.

https://raw.githubusercontent.com/litmuschaos/litmus/master/apps/prometheus/loadgen/run_litmus_test.yml

Liveness

Litmus job to check the liveness of Prometheus app.

https://raw.githubusercontent.com/litmuschaos/litmus/master/apps/prometheus/liveness/run_litmus_test.yml

Chaos jobs — Storage

Litmus job for inducing OpenEBS cStor pool pod to delete and verify the application availability.

Summary:

Building CI/CD pipelines for stateful applications like Prometheus on OpenEBS and Kubernetes/OpenShift is quick and easy. Most of the pipeline is readily available through Litmus. Use the readily available Litmus books to build Chaos Engineering into your GitLab pipelines.