No it doesn’t delete half your metrics.
Standalone Prometheus is pretty great: it provides a great query language along with a simple unified way of collecting and exposing metrics. Making Prometheus be highly available and scalable, however, can often be a bit of a challenge.
The key features we needed were:
- Highly available Prometheus
- Single place to query all of your metrics
- Easily back up and archive data
This is where Improbable’s Thanos comes in.
Making Prometheus HA
Thanos at its most basic allows you to query multiple Prometheus instances at once and has the ability to deduplicate the same metric from multiple instances. This allows you to run multiple replicas of the same Prometheus setup without worrying about metric duplication.
One of Thanos’ components is a Thanos sidecar that runs alongside each Prometheus container that together form a cluster. Instead of querying directly to the Prometheis (this is the official plural according to Prometheus) you query the Thanos Query component. The picture below helps to understand the relationship between Prometheus and Thanos.
This alone is great as it allows you to easily make a HA Prometheus setup but there’s even more you can do with these building blocks.
A single place to view Metrics
The next problem we looked at was how to get all your metrics into one place.
We run multiple Kubernetes clusters all with their own Prometheus. Historically we aggregated the metrics by having a special Prometheus that scraped each of the Prometheus’ federate endpoint. This did the job but was wasteful as we were just duplicating all of our metrics and this special Prometheus was a single point of failure.
A Thanos Query node can use another Query node as a source of data; if we expose the gRPC endpoint of our Thanos Query nodes in each cluster we can create a Thanos Query that aggregates them together by using them as stores. The picture below helps to illustrate this.
This allows us to go to one Thanos Query and get all our metrics across all our clusters. In the following screenshot I’m querying the number of replicas for our fluentd Daemonset across our three clusters (red, black, blue) with just one query.
However there is still a problem with this setup as we have one special Thanos Query in a special cluster that will take out our single metrics view if it goes down. Instead, we’d like to run multiple Thanos Query nodes, one in each cluster, for users to query against via some kind of load-balancing. Using our AWS multi-cluster load-balancing tool Yggdrasil (read more about that here) we can spread this out across multiple Kubernetes Clusters. The result is users can perform queries against any cluster and receive all metrics.
When you put this all together it looks something like this:
Note that each Thanos Query layer will be a ReplicaSet of Thanos Query nodes for added resilience.
This gives us an incredibly resilient Prometheus setup, spread out across multiple clusters with multiple replicas giving us a lot of layers of redundancy. You have one place to go to if you want to see your Prometheus metrics, no need to worry about what cluster/namespace etc that your application is running in.
Another common problem with Prometheus is backing up and retaining all your metrics; keeping data on your Prometheus instances is often expensive in terms of storage as well as impacting performance. Thanos solves this by having the sidecar continuously back up your data to a cloud storage provider such as S3 and then exposing the data via a Store node. The store node acts kind of like having another Prometheus instance in your Thanos cluster but with all the data coming from the S3 bucket.
The store node also gives you some nice resilience in the event of a cluster outage, if we lose an entire cluster I can no longer query the most recent data as those Prometheis are gone, but I can still query the store node in another cluster which has access to the historical data from S3.
One of the things we’ve wanted to do for a while is split up our Prometheis per team/namespace so that our individual Prometheus instances don’t have to be too large and to give us redundancy in the event of one team generating a very large amount of metrics that takes out Prometheus. This was always considered too much effort as having a separate endpoint for each team/namespace would’ve been a lot of overhead, but with Thanos we can just add the team based Prometheis to our Thanos cluster and still use the same single source for your metrics. So we’d like to switch to having many small Prometheis and use the Prometheus Operator to make it nice and simple to create them.
To conclude, Thanos has given us:
- Highly available Prometheus — Deduplication with Thanos sidecar + Thanos query
- Single place to query all of your metrics — Thanos query aggregating across all clusters
- Easily back up and archive data — Thanos sidecar and Thanos store for s3 data storage
Overall Thanos has really taken our metrics setup to the next level, it’s allowed us to simplify how our users find their metrics and given us a load of resilience in the process. It’s now incredibly simple to add metrics from a new Kubernetes cluster and even things not running in Kubernetes, as it all just works via the building blocks that Thanos provide. However, one caveat would be that Thanos is still a relatively new product so it’s not without the odd bug, thankfully this is countered by their very active team and community which are continuously improving it. Please check out their Github project or their slack channel if you want to join in.
Update: I did a talk on this topic at Prometheus London, where I talk about the setup described here as well as some of the further work we did with the Prometheus Operator, VPA and more. You can watch the talk here.