Automating Managed Prometheus and Grafana with Terraform for scalable observability on Azure Kubernetes Service and Istio

Saverio Proto
Microsoft Azure
Published in
4 min readApr 14, 2023

In my role at Microsoft I help customers running Istio on AKS. I maintain a repository with the necessary Terraform code to deploy AKS and install Istio.

In the past I was using the Prometheus Community Kubernetes Helm Charts to demonstrate how Istio improves the observability of your workload. However, running Prometheus at scale is a challenge. For a small team having the possibility to use a managed Prometheus and a managed Grafana installation adds a lot of value, because the engineers can focus on the product rather than on the observability platform.

Istio Workload Grafana Dashboard version 1.17.2

Inspired from Heyko Oelrichs’s article I published a variant of my work that automates with Terraform the following components:

Automation challenges

The Terraform code is organized in 3 distinct projects in the folders aks-tf, istio-tf and grafana-dashboards-tf. This means you have to perform 3 terraform apply operations like it is explained in the Terraform documentation of the Kubernetes provider. The reason is that you can’t configure the Terraform Grafana provider until the Grafana instance is deployed. In the same way you cannot configure the Helm and Kubernetes providers until the AKS cluster is deployed. If you use Terraform interpolation to configure the providers, intermittent and unpredictable errors will occur, because of the order in which Terraform itself evaluates the provider blocks and resources.

The challenges writing this Terraform code where the following:

About mTLS encryption and Observability

Istio makes it easy to enforce mTLS for encryption in transit for traffic between your workloads. When using Strict mTLS Prometheus will need to be configured to scrape using Istio certificates. This is documented in the Istio web site, and it is applicable when you run Prometheus in the same cluster. When using Azure Monitor Managed service for Prometheus the Istio control plane, gateway, and Envoy sidecar metrics will be scraped over plaintext. To have the scraping continue to work, you can write the specific PeerAuthentication with a portLevelMtls field to disable the scraping port. This is an example to scrape in plain text the sidecar of the application echoserver :

---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: allow-scraping-echoserver-sidecar
namespace: default
spec:
selector:
matchLabels:
run: echoserver
mtls:
mode: STRICT
portLevelMtls:
15020:
mode: DISABLE

Conclusion

I have shared my experience using Azure Managed Grafana and Azure Monitor Managed service for Prometheus with Istio to improve observability. The Terraform code that I have shared automates the deployment, and it is distributed under the MIT License, allowing customers to fork and modify the code according to their specific needs. I strongly recommend testing these managed observability offerings, especially when working with small platform teams. The amount of work required to keep these tools updated and secure is not negligible. Unless significant customization is needed, the managed services offer a good deal.

--

--

Saverio Proto
Microsoft Azure

Customer Experience Engineer @ Microsoft - Opinions and observations expressed in this blog posts are my own.