Deep Dive into Thanos-Part II

Pavan Kumar
May 8 · 7 min read

Monitoring Kubernetes Workloads with Thanos and Prometheus Operator

In Part I of this article we have understood the various components of Thanos and its use cases. In this Part II, we will configure Thanos with GCS ( Google Cloud Storage ) and understand how metrics can be retained for longer periods using Thanos. We will also configure Grafana to use the Thanos Query Frontend to visualize graphs from various clusters ( Thanos Queriers ).

Image Credits: Thanos website

What is the entire story all about? (TLDR)

  1. Install Thanos using the Bitnami Helm chart.
  2. Configure Thanos to use GCS as its Object store.

Prerequisites

  1. A Kubernetes cluster ( Can be either On-Prem, AKS, EKS, GKE, Kind ).
  2. A GCP account ( To Push the blocks to GCS ).

Story Resources

  1. GitHub Link: https://github.com/pavan-kumar-99/medium-manifests
  2. GitHub Branch: thanos
Image Credits: Thanos

Installing Thanos in the Kubernetes cluster

Well, there are multiple ways to install Thanos in your Kubernetes cluster. You might choose to Install them using kube-thanos, write down your own Kubernetes manifests / helm charts by referring to the Thanos commands for the corresponding components, install it via Bitnami’s Thanos helm chart ( Scope of this article ). Before we install the Thanos cluster, let us Install Prometheus with Thanos sidecar enabled. The Thanos sidecar will be responsible for pushing the TSBD blocks to the Object storage like GCS, AWS S3.

Installing Prometheus and Grafana with Prometheus Operator.

Before we install Prometheus Operator, we will have to create a secret with the GCP service account, so that our Thanos sidecar will be able to communicate with the bucket.

$ git clone https://github.com/pavan-kumar-99/medium-manifests.git \
-b thanos
$ cd medium-manifests/

Let us create a file with the name thanos-sidecar-secret.yaml. This will have the details regarding the type of Object Storage, the name of the GCS bucket, and the GCP service account to be used to communicate with GCP. Once the values of the files are substituted, let's create a secret from the file.

kubectl create secret generic thanos-gcp-config — from-file=thanos.yaml=thanos-sidecar-secret.yaml

By default, Thanos is not enabled in the Prometheus Operator installation. Let us write an override file to override the default values and install Thanos sidecar.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts$ helm repo update$ helm install prom-thaons prometheus-community/kube-prometheus-stack -f prometheus-operator-thanos.yaml

You show now see that the Prometheus, Grafana, and the other components get installed. Let us check the logs of the thanos-sidecar container now.

kubectl logs prometheus-prom-thaons-kube-prometheus-prometheus-0 -c thanos-sidecar

Let us now install the Thanos Cluster using Bitnami’s Helm Chart. I have created a custom value.yaml file for the installation of Thanos cluster using Bitnami’s Helm chart. This custom values.yaml file will help configuration easier. Feel free to add any other required values by referring to the full values.yaml file here.

$ helm repo add bitnami https://charts.bitnami.com/bitnami$ helm repo update$ helm install thanos bitnami/thanos -f thanos-values.yaml 

You should now see the Thanos cluster components being installed. And all of them should be ready in some time.

Thanos components

Without no further due, let us access the thanos-query-frontend by port-forwarding it. ( Please note that this type of installation is only suitable for Demo / Local Purposes. In production one might want to access this via an Ingress controller with TLS enabled ).

kubectl port-forward svc/thanos-query-frontend 9091:9090 — address=0.0.0.0

Thanos Frontend Query

I will now pause my blog here. Wait for another 4 days so that my TSBD blocks are pushed into the GCS bucket and then resume with the data available.

Waiting for 4 days so that the data could be accumulated.

I now have my data for the past 4 days available. Let us now connect to Grafana and then add out Thanos querier as a Prometheus source ( This can be automated by adding the data source URL while creating Grafana itself. I am showing this manually only for the sake of better understanding). Let me access Grafana from the browser. ( Credentials: admin / prom-operator ).

kubectl port-forward svc/prom-thaons-grafana 8080:80 — address=0.0.0.0

Grafana Datasource

The URL here is the URL of the Thanos Query Frontend. The IP is the ClusterIP of the thanos-query-frontend service. The IP can be obtained by

✔ root@master::~# k get svc thanos-query-frontend

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

thanos-query-frontend ClusterIP 10.108.238.31 <none> 9090/TCP 4d23h

Once you add the IP with the corresponding port, you should get a pop-up saying that the Data-source is working.

Grafana Datasource added

By default when you install Grafana with Prometheus Operator, you should get some dashboards by default. These dashboards monitor your control plane components and the resources like pods, network usage in your cluster. One such dashboard is Compute Resources / Node (Pods).

Grafana dashboard showing the metrics for the past 4 days

Let us now kill the Prometheus Pods, to verify if the metrics for the past 4 days are still shown. Before we do that let us now verify the metrics for the past 5 minutes. We can see the metrics for the past five minutes too.

Metrics for the Past 5 minutes.

I will now turn off my Kubernetes cluster ( The VM on which the Kubernetes Cluster is hosted ) and return after 5 minutes. When I turn on the VM all the pods will also be restarted (rescheduled)by default.

I am back. I will now turn on my Kubernetes cluster to verify the metrics for the past 30 minutes.

Data lost when the Prometheus Pods are restarted.

But what happened to the metrics when the pod was restarted? Exactly, this is what I wanted to show. Though we have set up Thanos for HA, the TSBD blocks are pushed every 2 hours. In the meanwhile, if the Prometheus pod is restarted the data within the 2 hours frame is gone. So it is always recommended to also have a PV created for the Prometheus itself. Let us now upgrade the helm chart, but now let us also add the persistent volumes to our Prometheus replicas.

Once I perform a helm upgrade, I can now find my volume attached to the Pod.

helm upgrade -i prom-thaons prometheus-community/kube-prometheus-stack -f prom-operator.yaml

Prometheus Pod having PVC

Let me now restart the Prometheus pods.

Restarted the Pods
Data remains in the Persistent Volume

Let us now check our GCS for the blocks stored there.

TSBD blocks uploaded by Thanos sidecar

We can also utilize the bucketweb tool installed as a part of Thanos installation to inspect bucket blocks from a Web UI.

kubectl port-forward svc/thanos-bucketweb 8080:8080 — address=0.0.0.0

Thanos Bucket web

Conclusion

Thus, with the help of Thanos and Prometheus, one can set up a HA metrics aggregation solution. While there are many other installation methods to install Thanos, exploring them would also help you understand how to configure each of the components. Initially, I had a hard time understanding each and every component of Thanos. But exploring each of the components individually has given me greater insights into Thanos. Feel free to share your experiences with Thanos in the comments section.

Until next time…..

Recommended

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Pavan Kumar

Written by

Cloud DevOps Engineer at Informatica || CKA | CSA | CRO | AWS | ISTIO | AZURE | GCP | DEVOPS Linkedin:https://www.linkedin.com/in/pavankumar1999/

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store