Today I want to talk about Thanos, a hero, that will help us with an impossible mission: A production-grade Prometheus deployment. Prometheus is an amazing tool, that can do a lot of things — from metrics to alerting. But there is one problem that is a bit harder to solve — longer-term storage for Prometheus metrics. And, after all, having metrics only for a day or two, is not that useful. And this is where Thanos fit in.
At Soluto, where I work, we have multiple production Kubernetes clusters. On each one of those clusters, we have a Prometheus instance, that is responsible for collecting metrics and monitoring everything on this cluster. …
Istio is one of the most popular service mesh. It can help in solving many issues that surface when running a lot of microservices — things like authentication, authorization, observability and traffic routing. It all sounds really promising, so we decided to give it a try at Soluto. During the process of deploying it on an existing cluster and enabling it on existing workloads, I faced a lot of interesting issues. Let me share some of them with you.
Istio is a really complex product. It has a few moving parts that are required for a functional deployment. …