Kubernetes multi-cluster monitoring architecture
This article describes the architecture of monitoring multiple Kubernetes clusters. It doesn’t go over implementation and code
Prometheus with Grafana is a popular monitoring setup for Kubernetes clusters. While this works great and is sometimes out of the box for a single cluster, what happens when your fleet of clusters is getting larger and larger?
What we wanted to achieve is the following:
- Centralized place to view the metrics from all the clusters
- Centralized storage for all the metrics from all the clusters
- Centralized alert manager instance
- Consistent and scalable setup for any new cluster we’re adding
We decided to go with Thanos
In simple words, Thanos is a tool that aggregates metrics from multiple Prometheus instances.
Let’s talk architecture
Here is a diagram of the current architecture, I’ll go through the components and the responsibility of each component
Prometheus Instances
Each Prometheus instance will be configured with external-labels to identify the instance (e.g: cluster name). external-labels are labels that will be added to any time series or alerts when communicating with external systems
external_labels:
cluster: cluster-a
another_label: somevalue
Thanos Sidecar
The sidecar is a container that runs next to the Prometheus instance. It will read the metrics from the Prometheus instance. It will also backup these metrics into the storage backend. Hence the arrows from the sidecars to the storage.
Here are the supported storage clients for Thanos
Thanos Query
The Query component is stateless and horizontally scalable and can be deployed with any number of replicas. Once connected to the Sidecars, it automatically detects which Prometheus servers need to be contacted for a given PromQL query.
Query also implements Prometheus’s official HTTP API and can thus be used with external tools such as Grafana. It also serves a derivative of Prometheus’s UI for ad-hoc querying and stores status
Now if you notice in the architectural diagram, each cluster has a Thanos query component that points to the local Thanos sidecar.
Eventually, the Thanos query instance in the observability cluster will point to the Thanos query instances of the other clusters
Using the Thanos helm chart, here is how our Thanos in the observability cluster looks like
query:
stores:
- cluster-a.thanos-query.domain.internal:10901
- cluster-b.thanos-query.domain.internal:10901
- thanos-sidecar.monitoring:10901 # <- local thanos sidecar
The observability query instance will be used in Grafana as the Prometheus source. We will then be able to see the metrics from all the clusters
Thanos Storegateway
As the sidecar backs up data into the object storage of your choice, you can decrease Prometheus retention and store less locally. However we need a way to query all that historical data again. The store gateway does just that by implementing the same gRPC data API as the sidecars but backing it with data it can find in your object storage bucket. Just like sidecars and query nodes, the store gateway exposes StoreAPI and needs to be discovered by Thanos Querier
We can add the Storegateway to our centralized Thanos query instance
query:
stores:
- ... # <- the above configuration
- thanos-storegateway.monitoring:10901
Thanos Compactor
A local Prometheus installation periodically compacts older data to improve query efficiency. Since the sidecar backs up data as soon as possible, we need a way to apply the same process to data in the object storage.
The compactor component simple scans the object storage and processes compaction where required. At the same time it is responsible for creating downsampled copies of data to speed up queries.
Alerting
Based on our architectural diagram, we will go with a centralized alert manager and leverage the Thanos ruler component
Thanos Ruler
It does rule and alert evaluation on top of a given Thanos Querier endpoint.
According to our architectural diagram, we will set up a Thanos ruler in every cluster. That means the ruler will only evaluate the metrics from the query instance in the same cluster.
Using the Thanos helm chart, here is a snippet of our ruler instance configuration
ruler:
enabled: true
alertmanagers:
- http://alertmanager.domain.internal
extraFlags
- --label=cluster="cluster-a"
The --label
is a label to be added to the alerts/metrics to identify the ruler instance. This is similar to Prometheus external-labels
. We will use the cluster
label in our alert manager configuration later
The same setup will be for each Kubernetes cluster (of course with a different cluster name value in the label).
Now that we’re set, we need to configure our alert manager on how to route the alerts
Since this is up to you to configure, I will only link to the example on GitHub by the Prometheus team
Closing Notes
I’ve been using Thanos in production for over a year, and it ticked all the points/goals I wanted to achieve.
Thanos also supports multi-tenancy but I haven’t tried that setup yet