Deep Dive into Thanos-Part I

Pavan Kumar
May 8 · 6 min read

Monitoring Kubernetes Workloads with Thanos and Prometheus Operator

Are your applications running on Kubernetes? Is it highly scalable and you are happy with the way it works? Wait a minute, How are you monitoring them? Ahh, Prometheus Right? Cool, Did you ever wonder how scalable and Highly available your Prometheus Cluster Is? Before that, here is a mail from your boss asking you to find out the number of http_requests that your website received last Xmas or Let's make this the Indian Style. Your boss wants to know the number of customers who had visited your website ( total number of http_requests ) the last Sankranthi ( An year ago ). Now you tried accessing your Prometheus / Grafana servers. You just realized that the metrics are not found. What do you tell your boss now? Well before this situation actually arises let us try to fix this by using Thanos. Thanos is a tool to set up a Highly Available Prometheus with long-term storage capabilities. Thanos is Open Source and is a CNCF Incubating Project. The features of Thanos are

  1. Unlimited retention of Prometheus metrics within the Supported Object stores like GCS, S3, Azure Blob, Swift, and Tencent COS.
  2. Global Query view helps us to view the metrics from multiple Prometheus Instances spawned across various namespaces and various clusters.
  3. It is compatible with your existing monitoring tools like Prometheus and Grafana.
  4. Downsample historical data for massive query speedup when querying large time ranges or configure complex retention policies.
Image Credits Thanos Website

What is the entire story all about? (TLDR)

  1. Getting to know Thanos Components.
  2. Implementing HA-Prometheus with Thanos, Prometheus Operator, and GCS ( Object Store ).

Prerequisites

  1. Basic understanding of Prometheus.

Story Resources

  1. GitHub Link: https://github.com/pavan-kumar-99/medium-manifests
  2. GitHub Branch: thanos

Understanding Thanos Components

Before we start, let us first understand Thanos's components in detail. When I was initially trying to study Thanos, I really had a hard time understanding how Thanos works and the components needed for Thanos to be fully functional, and the role of each component. So let us demystify each component in detail and understand their usage with a very useful architecture diagram from the official website of Thanos.

Thanos Architecture

a) Thanos Sidecar

Thanos Sidecar is deployed as a sidecar container to the Prometheus Pod. [Sidecar containers are the containers that should run along with the main container in the pod]. This is one of the components that interact with your Object storage ( i.e. S3, GCS, Azure Blob, etc ). It is responsible for uploading TSBD blocks to the object storage. The blocks that are produced by Prometheus every two hours are uploaded ( once every two hours ) to the Object storage by Thanos Sidecar. Let me show you the logs of a sample Thanos sidecar container uploading TSBD blocks to GCS.

b) Thanos Querier / Query

The Thanos Querier / Query is a stateless component that implements Prometheus HTTP v1 API to query data in a Thanos cluster. It gathers the data needed to evaluate a PromQL query from the underlying store APIs via the gRPC protocol. The store can be either be one of the data sources that implement the gRPC store API.

  • Prometheus ( Thanos sidecar enabled via headless service discovery ).
  • From Object storage like S3, GCS via Store Gateway.
  • Another Thanos Querier ( Can be from a different cluster ).
Thanos Querier UI

Thanos Querier UI showing the various stores ( That were discovered through Prometheus Sidecar and via another store from a different cluster ).

c) Thanos Query Frontend

The Thanos Query frontend is a service that is put in front of Thanos querier to improve the read path. It helps us in splitting a long query into multiple short queries based. This helps in better parallelization of the query and also helps in better load balancing of the queries. This also helps in caching the query and improves the efficiency of the longer queries. Currently, in-memory cache (FIFO cache) and Memcached are supported.

Thanos Query Frontend

This is just similar to the Thanos querier UI. But enables features like Query Splitting and Caching of queries.

d) Thanos Store Gateway ( Thanos Store )

Thanos Store Gateway acts as an API Gateway between your Thanos cluster and the Object store. This is one of the components that require access to your Object storage. It implements the Store API on top of historical data in an object storage bucket. It keeps a small amount of information about all remote blocks on the local disk and keeps it in sync with the bucket.

Sample Thanos Store Gateway logs

e) Thanos Compactor ( Compactor )

As we know that Prometheus periodically compacts the blocks of data to improve query efficiency. In the same way, the compactor scans the Objects stored in Object Storage ( Like AWS S3, GCS, Azure Blob, etc ) and applies compaction wherever necessary. This component also helps in downsampling the data to increase the query efficiency for larger blocks of data.

By default, compact will run to completion once it compacts the objects. For this to run indefinitely make sure to add the flag — wait while running this compactor. It must be deployed as a singleton against a bucket. The compactor usually needs 100–300GB of local data for processing the data locally. In ideal cases, 50–70GB of data would suffice unless your metrics are really huge.

Sample Thanos Compact Logs

f) Thanos Ruler

The Ruler evaluates Prometheus recording and alerting rules against chosen query API. You can think of Rule as a simplified Prometheus that does not require a sidecar and does not scrape and do PromQL evaluation (no QueryAPI).

g) Thanos Receive

Thanos receive implements the Prometheus Remote Write API. The Thanos Sidecar is not sufficient for this, as the system would always lag the block length behind (typically 2 hours). Read more about Thanos Receive here.

h) Thanos Tools

Thanos tools are additional tools that provide additional capabilities and tools compared with the other Thanos components. A few of them are

1) Thanos tools bucket web: This is used to inspect bucket blocks from a Web UI.

2) Thanos tools bucket ls: This is used to list all blocks in the specified bucket.

3) Thanos tools bucket replicate: This is used to replicate buckets from one object storage to another.

Thanos Web Bucket viewer

Well, these are the various components of Thanos in detail. In the Part2 of this article, we will perform a hands-on to explore the various components of Thanos and Integrate Thanos with Prometheus and Grafana.

Conclusion

These are the various components that are present in Thanos. In the Part2 of this article, I have explained how to set up an HA-Prometheus with Thanos sidecar pushing the TSBD blocks to the GCS bucket. I have also explained how to Install a Thanos cluster using Bitnami’s Thanos Helm chart.

Until next time…..

Recommended

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Pavan Kumar

Written by

Cloud DevOps Engineer at Informatica || CKA | CSA | CRO | AWS | ISTIO | AZURE | GCP | DEVOPS Linkedin:https://www.linkedin.com/in/pavankumar1999/

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store