Deep Dive into Thanos-Part II
Monitoring Kubernetes Workloads with Thanos and Prometheus Operator
In Part I of this article we have understood the various components of Thanos and its use cases. In this Part II, we will configure Thanos with GCS ( Google Cloud Storage ) and understand how metrics can be retained for longer periods using Thanos. We will also configure Grafana to use the Thanos Query Frontend to visualize graphs from various clusters ( Thanos Queriers ).
What is the entire story all about? (TLDR)
- Install Thanos using the Bitnami Helm chart.
- Configure Thanos to use GCS as its Object store.
Prerequisites
- A Kubernetes cluster ( Can be either On-Prem, AKS, EKS, GKE, Kind ).
- A GCP account ( To Push the blocks to GCS ).
Story Resources
- GitHub Link: https://github.com/pavan-kumar-99/medium-manifests
- GitHub Branch: thanos
Installing Thanos in the Kubernetes cluster
Well, there are multiple ways to install Thanos in your Kubernetes cluster. You might choose to Install them using kube-thanos, write down your own Kubernetes manifests / helm charts by referring to the Thanos commands for the corresponding components, install it via Bitnami’s Thanos helm chart ( Scope of this article ). Before we install the Thanos cluster, let us Install Prometheus with Thanos sidecar enabled. The Thanos sidecar will be responsible for pushing the TSBD blocks to the Object storage like GCS, AWS S3.
Installing Prometheus and Grafana with Prometheus Operator.
Before we install Prometheus Operator, we will have to create a secret with the GCP service account, so that our Thanos sidecar will be able to communicate with the bucket.
$ git clone https://github.com/pavan-kumar-99/medium-manifests.git \
-b thanos$ cd medium-manifests/
Let us create a file with the name thanos-sidecar-secret.yaml. This will have the details regarding the type of Object Storage, the name of the GCS bucket, and the GCP service account to be used to communicate with GCP. Once the values of the files are substituted, let's create a secret from the file.
kubectl create secret generic thanos-gcp-config — from-file=thanos.yaml=thanos-sidecar-secret.yaml
By default, Thanos is not enabled in the Prometheus Operator installation. Let us write an override file to override the default values and install Thanos sidecar.
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts$ helm repo update$ helm install prom-thaons prometheus-community/kube-prometheus-stack -f prometheus-operator-thanos.yaml
You should now see that the Prometheus, Grafana, and the other components get installed. Let us check the logs of the thanos-sidecar container now.
Let us now install the Thanos Cluster using Bitnami’s Helm Chart. I have created a custom value.yaml file for the installation of Thanos cluster using Bitnami’s Helm chart. This custom values.yaml file will help configuration easier. Feel free to add any other required values by referring to the full values.yaml file here.
$ helm repo add bitnami https://charts.bitnami.com/bitnami$ helm repo update$ helm install thanos bitnami/thanos -f thanos-values.yaml
You should now see the Thanos cluster components being installed. And all of them should be ready in some time.
Without no further due, let us access the thanos-query-frontend by port-forwarding it. ( Please note that this type of installation is only suitable for Demo / Local Purposes. In production one might want to access this via an Ingress controller with TLS enabled ).
kubectl port-forward svc/thanos-query-frontend 9091:9090 — address=0.0.0.0
I will now pause my blog here. Wait for another 4 days so that my TSBD blocks are pushed into the GCS bucket and then resume with the data available.
I now have my data for the past 4 days available. Let us now connect to Grafana and then add out Thanos querier as a Prometheus source ( This can be automated by adding the data source URL while creating Grafana itself. I am showing this manually only for the sake of better understanding). Let me access Grafana from the browser. ( Credentials: admin / prom-operator ).
kubectl port-forward svc/prom-thaons-grafana 8080:80 — address=0.0.0.0
The URL here is the URL of the Thanos Query Frontend. The IP is the ClusterIP of the thanos-query-frontend service. The IP can be obtained by
✔ root@master::~# k get svc thanos-query-frontend
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
thanos-query-frontend ClusterIP 10.108.238.31 <none> 9090/TCP 4d23h
Once you add the IP with the corresponding port, you should get a pop-up saying that the Data-source is working.
By default when you install Grafana with Prometheus Operator, you should get some dashboards by default. These dashboards monitor your control plane components and the resources like pods, network usage in your cluster. One such dashboard is Compute Resources / Node (Pods).
Let us now kill the Prometheus Pods, to verify if the metrics for the past 4 days are still shown. Before we do that let us now verify the metrics for the past 5 minutes. We can see the metrics for the past five minutes too.
I will now turn off my Kubernetes cluster ( The VM on which the Kubernetes Cluster is hosted ) and return after 5 minutes. When I turn on the VM all the pods will also be restarted (rescheduled)by default.
I am back. I will now turn on my Kubernetes cluster to verify the metrics for the past 30 minutes.
But what happened to the metrics when the pod was restarted? Exactly, this is what I wanted to show. Though we have set up Thanos for HA, the TSBD blocks are pushed every 2 hours. In the meanwhile, if the Prometheus pod is restarted the data within the 2 hours frame is gone. So it is always recommended to also have a PV created for the Prometheus itself. Let us now upgrade the helm chart, but now let us also add the persistent volumes to our Prometheus replicas.
Once I perform a helm upgrade, I can now find my volume attached to the Pod.
helm upgrade -i prom-thaons prometheus-community/kube-prometheus-stack -f prom-operator.yaml
Let me now restart the Prometheus pods.
Let us now check our GCS for the blocks stored there.
We can also utilize the bucketweb tool installed as a part of Thanos installation to inspect bucket blocks from a Web UI.
kubectl port-forward svc/thanos-bucketweb 8080:8080 — address=0.0.0.0
GKE Workload Identity
If you’d like to use GKE workload Identity, Please follow the steps specified here, and use the helm values.yaml files from here.
Conclusion
Thus, with the help of Thanos and Prometheus, one can set up a HA metrics aggregation solution. While there are many other installation methods to install Thanos, exploring them would also help you understand how to configure each of the components. Initially, I had a hard time understanding each and every component of Thanos. But exploring each of the components individually has given me greater insights into Thanos. Feel free to share your experiences with Thanos in the comments section.
Until next time…..