Kubernetes multi-cluster monitoring architecture

4 min readJan 1, 2022

This article describes the architecture of monitoring multiple Kubernetes clusters. It doesn’t go over implementation and code

Prometheus with Grafana is a popular monitoring setup for Kubernetes clusters. While this works great and is sometimes out of the box for a single cluster, what happens when your fleet of clusters is getting larger and larger?

What we wanted to achieve is the following:

Centralized place to view the metrics from all the clusters
Centralized storage for all the metrics from all the clusters
Centralized alert manager instance
Consistent and scalable setup for any new cluster we’re adding

We decided to go with Thanos

In simple words, Thanos is a tool that aggregates metrics from multiple Prometheus instances.

Let’s talk architecture

Here is a diagram of the current architecture, I’ll go through the components and the responsibility of each component

Prometheus Instances

Each Prometheus instance will be configured with external-labels to identify the instance (e.g: cluster name). external-labels are labels that will be added to any time series or alerts when communicating with external systems

external_labels:
  cluster: cluster-a
  another_label: somevalue

Thanos Sidecar

The sidecar is a container that runs next to the Prometheus instance. It will read the metrics from the Prometheus instance. It will also backup these metrics into the storage backend. Hence the arrows from the sidecars to the storage.

Here are the supported storage clients for Thanos

Thanos Query

The Query component is stateless and horizontally scalable and can be deployed with any number of replicas. Once connected to the Sidecars, it automatically detects which Prometheus servers need to be contacted for a given PromQL query.
Query also implements Prometheus’s official HTTP API and can thus be used with external tools such as Grafana. It also serves a derivative of Prometheus’s UI for ad-hoc querying and stores status

Now if you notice in the architectural diagram, each cluster has a Thanos query component that points to the local Thanos sidecar.

Eventually, the Thanos query instance in the observability cluster will point to the Thanos query instances of the other clusters

Using the Thanos helm chart, here is how our Thanos in the observability cluster looks like

query:
  stores:
    - cluster-a.thanos-query.domain.internal:10901
    - cluster-b.thanos-query.domain.internal:10901
    - thanos-sidecar.monitoring:10901 # <- local thanos sidecar

The observability query instance will be used in Grafana as the Prometheus source. We will then be able to see the metrics from all the clusters

Thanos Storegateway

As the sidecar backs up data into the object storage of your choice, you can decrease Prometheus retention and store less locally. However we need a way to query all that historical data again. The store gateway does just that by implementing the same gRPC data API as the sidecars but backing it with data it can find in your object storage bucket. Just like sidecars and query nodes, the store gateway exposes StoreAPI and needs to be discovered by Thanos Querier

We can add the Storegateway to our centralized Thanos query instance

query:
  stores:
    - ... # <- the above configuration
    - thanos-storegateway.monitoring:10901

Thanos Compactor

A local Prometheus installation periodically compacts older data to improve query efficiency. Since the sidecar backs up data as soon as possible, we need a way to apply the same process to data in the object storage.
The compactor component simple scans the object storage and processes compaction where required. At the same time it is responsible for creating downsampled copies of data to speed up queries.

Alerting

Based on our architectural diagram, we will go with a centralized alert manager and leverage the Thanos ruler component

Thanos Ruler

It does rule and alert evaluation on top of a given Thanos Querier endpoint.

According to our architectural diagram, we will set up a Thanos ruler in every cluster. That means the ruler will only evaluate the metrics from the query instance in the same cluster.

Using the Thanos helm chart, here is a snippet of our ruler instance configuration

ruler:
  enabled: true
  alertmanagers:
    - http://alertmanager.domain.internal
  extraFlags
    - --label=cluster="cluster-a"

The --label is a label to be added to the alerts/metrics to identify the ruler instance. This is similar to Prometheus external-labels. We will use the cluster label in our alert manager configuration later

The same setup will be for each Kubernetes cluster (of course with a different cluster name value in the label).

Now that we’re set, we need to configure our alert manager on how to route the alerts

Since this is up to you to configure, I will only link to the example on GitHub by the Prometheus team

Closing Notes

I’ve been using Thanos in production for over a year, and it ticked all the points/goals I wanted to achieve.

Thanos also supports multi-tenancy but I haven’t tried that setup yet