Monitoring a multi-cluster Kubernetes Deployment

Deploying a high resilience monitoring and observation platform for Kubernetes multi-cluster solutions

Dascalescudanstefan
Globant
7 min readDec 20, 2023

--

Engineer looking at monitoring graphs
Image source: Unsplash

In this article, we will do a Terraform deployment of the Prometheus Operator, Thanos, and Grafana for multi-cluster monitoring. This setup is easily scalable, resilient to node failures, and can provide monitoring for multiple Kubernetes clusters from one Grafana instance. The main focus will be Thanos and how to aggregate data from various sources to achieve centralized real-time monitoring across multiple highly resilient clusters.

Prerequisites

You need three different Kubernetes clusters running on the same cloud provider:

  • Two Data clusters that you want to monitor and that will host the Prometheus deployments and applications that expose their metrics via Prometheus.
  • One DevOps (monitoring) Cluster that will host Thanos and Grafana for data aggregation and visualization.

You also need the following tools:

  • kubectl CLI.
  • Helm v3.x package manager.
  • Terraform 1.6.x, installed and configured to work with your clusters.
  • Access to a cloud provider (such as AWS S3, Azure Blob Storage, etc.) CLI to create and configure a long-term storage solution.

Why Prometheus Operator and Not Just Prometheus?

Because the deployments created with the operator are easier to manage due to predefined CRDs, they are easier to scale, can be dynamically updated, and can automatically discover labeled resources in the Kubernetes cluster. The deployment includes an Alert Manager, which we can use later to send notifications when metrics change. The downside to all this added convenience is that it uses quite a bit more resources than necessary. If resources are a concern, the same tutorial can be done using Prometheus deployed in agent mode, but it will be up to the reader to configure the Terraform scripts, CRDs, and scrape configs.

How Does Thanos work, and Why do We Need It?

Thanos is a clustered system of components with distinct purposes that can be composed into a highly available monitoring setup with long-term storage capabilities. The main issue with using Prometheus and querying that data from Grafana is that in case of an incident where a node or cluster goes down, the data is inaccessible for the duration of the downtime. Furthermore, there is the problem of storage, which is expensive and is consumed fast in production environments with millions of time series.

Thanos solves this problem by decoupling the query from the Prometheus metrics collector and relying on cheap object storage for historical data retrieval. That means that if a node goes down with the application, we still have access to that node’s historical data. Even more, we can check monthly and yearly trends for very low cost and see any impact changes have made on resource usage over a long period.

We will use the Prometheus Operator with a few standard metrics configured for our particular example project. This will send the monitored data via remote-write to a Thanos Receiver that will save the data in object storage. Next, we will have a Thanos Query instance that will deal with the queries, either by accessing the Thanos Receiver or by using the Thanos Store and the StoreAPI. Grafana will access Thanos Query to retrieve the required data for populating the dashboards.

Proposed architecture for Prometheus-Thanos-Grafana monitoring using Thanos Receive
Proposed architecture for Prometheus-Thanos-Grafana monitoring using Thanos Receive

For future projects, we can take high availability a step further since Thanos Receiver supports multi-tenancy, load balancing, and data replication by running multiple instances as part of a single hashring. A hashring allows for sharding, where each shard can be kept on another Thanos Receiver pod for redundancy and load sharing. But that is a topic for another article.

Deploying Prometheus Operator using Terraform

To deploy the Prometheus operator in our data clusters, we must select our provider, initialize the Helm repository, and customize the values.yaml file we find in the chart sources. Most of the changes will be up to the reader; I will only provide a baseline from which you can start tinker and experiment.

The following code snippet will install the Prometheus operator using the official helm release chart:

provider "helm" {
kubernetes {
config_path = "~/.kube/config"
}
}

resource "helm_release" "prometheus_operator" {
name = "prometheus-operator"
repository = "https://charts.helm.sh/stable"
chart = "kube-prometheus-stack"
version = "54.1.0"
namespace = "monitoring"
create_namespace = true
cleanup_on_fail = true

values = [
file("${path.module}/values.yaml")
]
}

The values in the values.yaml file will override the ones in the default chart. We will disable some chart components since they are not needed for this deployment, but be aware that we are also missing some metrics due to this. You are welcome to experiment with storage since your needs might be entirely unique. The following file is just an example:

# values.yaml
alertmanager:
enabled: false

grafana:
enabled: false

prometheusOperator:
enabled: true

kubeApiServer:
enabled: false

kubelet:
enabled: true

kubeControllerManager:
enabled: false

coreDns:
enabled: false

kubeDns:
enabled: false

kubeEtcd:
enabled: false

kubeScheduler:
enabled: false

kubeProxy:
enabled: false

kubeStateMetrics:
enabled: true

nodeExporter:
enabled: true

prometheus-node-exporter:
enabled: true

prometheus:
enabled: true

prometheusSpec:
externalLabels:
cluster: "data-cluster-1" # update accordingly
environment: "test" # update accordingly
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
remoteWrite:
- url: "http://remote-write-thanos/api/v1/write"
queue_config:
batch_send_deadline: 30s
max_shards: 10

So far, we have configured a Prometheus operator deployment to write data to a Thanos Receiver endpoint. We can deploy this configuration on all of our data clusters after updating the corresponding external labels to tell which instance is which.

Deploying Thanos and Grafana in the Monitoring Cluster

One consideration here is that the monitoring cluster is the single point of failure for the example project, but there are ways of getting around that. However, for this example, we will deploy a Thanos helm release and configure the receiver to accept remote-write requests and write to the storage after 15 days so that the local storage is manageable. Ingress configuration is left to the reader’s discretion, but be mindful that we also need that configured to have cluster-to-cluster communication. We will use Bitnami’s implementation of Thanos and Grafana deployments since they are generally considered solid.

The following Terraform script installs the minimal Thanos Helm release:

provider "helm" {
kubernetes {
config_path = "~/.kube/config"
}
}

resource "helm_release" "thanos" {
name = "thanos"
repository = "https://charts.bitnami.com/bitnami"
chart = "thanos"
version = "3.4.2"
namespace = "monitoring"
create_namespace = true
cleanup_on_fail = true

values = [
file("${path.module}/values.yaml")
]
}

And as we did before, the magic happens in the values.yaml file. Here, we will overwrite many of the components, using just the minimum to see the metrics we want to see in Grafana. There are other Thanos components that we have not discussed since we will not be using them. Some of them are the Thanos Compactor, which compacts the logs from the object storage, and the Thanos Ruler, which acts like an Alert Manager on top of Thanos Query to send alerts when the evaluation of a query triggers it.

Below is an example of a values file configured to take advantage of our object storage and that enables and configures the Thanos Receiver component:

objstoreConfig: |
type: S3
config:
bucket: "my-bucket" # use your own bucket name
endpoint: "s3.amazonaws.com"
access_key: "my-access-key" # use you own access key
secret_key: "my-secret-key" # use your own secret key
insecure: false

receive:
enabled: true
config: |
type: RECEIVE
tsdb:
path: /var/thanos/receive
retention: 15d
label:
- name: receive_replica
value: $(NAME)
web:
http:
bind_address: 0.0.0.0
listen_address: 0.0.0.0:10902
remote_read:
- url: http://localhost:10901/api/v1/read
remote_write:
- url: http://localhost:10901/api/v1/write

ruler:
enabled: false

compactor:
enabled: false

And now, to deploy Grafana, we will also be using Bitnami’s helm charts and configuring our Thanos Query as the primary source of Prometheus data through the already-known values.yaml file:

provider "helm" {
kubernetes {
config_path = "~/.kube/config"
}
}

resource "helm_release" "grafana" {
name = "grafana"
repository = "https://charts.bitnami.com/bitnami"
chart = "grafana"
version = "5.3.6"
namespace = "observability"
create_namespace = true
cleanup_on_fail = true

values = [
file("${path.module}/values.yaml")
]
}

Now, we need to add the data source in the values file:

adminUser: "admin" #use something more secure
adminPassword: "medium" #use something more secure
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Thanos
type: prometheus
url: http://thanos-query.monitoring.svc:10901
isDefault: true

Keep in mind that for production-ready environments, you will need to use ingress configuration, TLS certificates, service accounts, cluster roles, safelists, load balancers, and many other network and security configurations.

Now, you can go to https://mydomain:3000 and log in using the username and password specified in the values file. From here, you can download/import your favorite dashboard and start configuring the metrics you want to see.

Conclusions

Now, we have a highly resilient and available monitoring solution that can withstand node failures, is not limited by storage capacity at the node level, and offers access to historical data for a very low cost using object storage. Thanos is central to linking multiple Prometheus instances to one Grafana instance, offering many benefits and improving the overall solution. I believe that in the future, it will be as emblematic of a joining for metrics monitoring as ELK (Elasticsearch, Logstash, and Kibana) is now for application logs.

References

--

--