Implementing Logging using Promtail Loki and Grafana on multi-cluster Azure Kubernetes Service Setup
In this article, I would like to write about how to use the famous PLG (Promtail/Loki/Grafana) Stack as a logging solution for applications hosted on Azure Kubernetes Service (AKS).
Prerequisites: Azure, Kubernetes (AKS), Helm, Promtail/Loki/Grafana
Promtail: Client that collects logs using Prometheus under the hood. ( In K8 terms, these are nothing but DaemonSets running on each Node to collect the log -> relabel them and push to Loki server)
Loki: Grafana Loki is a log aggregation tool, and it is the core of a fully-featured logging stack. Loki is a datastore optimized for efficiently holding log data.
Grafana: Visualizing the Log data using customizable Dashboards.
Problem Statement: We can easily install Loki-Stack Helm chart on the AKS cluster that is hosting the application and be able to visualize logs on Grafana. But an ideal production server will have many AKS clusters setup together each assigned for a different use case lying in the same or a different resource group. I have personally seen up to 30 clusters running in a production server. Here we are talking about monitoring logs of all the applications hosted on each cluster.
Solution: There are 3 solutions to this problem.
- Install the PLG stack on each cluster and visualize the logs from the Grafana instance running on those clusters.
- Install PL stack on each logging cluster, but Grafana on just one say monitoring cluster. In this case, all the logs are aggregated by the Loki instances running on individual logging clusters which can be configured as a data source to the Grafana instance.
- Install Promtail on each cluster which will push logs to the monitoring cluster where a single Loki/Grafana instance is running. Loki will aggregate all the logs and label them based on the origin. Grafana instance will be used to visualize the logs based on these labels.
It is clear that the 3rd solution has an advantage over the first 2. Lets try to understand this from maintenance, cost and security point of view.
- Maintenance: Solution 3 will have the least maintenance. Just Monitoring cluster with (PLG) components and other clusters with (P) Promtail component. Solution 1 and 2 have many more components to maintain
- Cost: It is very obvious that solution 3 is most cost- since the compute resources being consumed are the least.
- Security: Solution 1 exposes Grafana on each cluster which is a security risk as compared to Solution 2/3 where just one monitoring cluster will host Grafana. But Solution 2 will again expose Loki on each cluster to be configured as a data source for Grafana. Hence Solution 3 clearly wins as in this case, Grafana and Loki are exposed on just the monitoring cluster.
Implementation
Now having understood how to solve the multi-cluster logging approach, let us deep dive and try to implement this.

- Promtail scrapes logs from applications running in Log Producer AKS clusters
- Promtail is configured to discover K8 apps using promtail secret which is used as the promtail config. The same config will contain Loki address to push logs.
- Grafana is configured with Loki as data source
- Grafana and Loki are exposed for external access via Ingress controller like Istio or Nginx
Steps To Setup
Monitoring Cluster
Install PLG helm chart with both Grafana and Loki exposed as LoadBalancer services . Note the Public IPs of both Grafana and Loki.
Note: Exposing services using LoadBalancer IP is highly insecure and should be used only for demonstration or testing purposes. In production environments, it is preferable to deploy an NGINX Ingress Controller to control access from outside the cluster and further limit access using whitelisting and other security-related configuration.
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install loki grafana/loki-stack --set grafana.enabled=true,grafana.service.type=LoadBalancer
Log Producer Cluster
Just Install Promtail on each cluster pointing to Loki External IP obtained from the above step.
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install promtail grafana/promtail --set "config.lokiAddress=http://<LOKI PUBLIC IP>:3100/loki/api/v1/push"
Monitoring Cluster
- Access Grafana using the public IP obtained from the first step
- Fetch Grafana login credentials from K8 secret named loki-grafana.
- Explore Loki using Grafana Explore option . Select Loki as data source from the top. This is by default set to http://loki:3100
- Try to check the Log Browser option for logs by selecting the available labels. try to select a label that is unique across log producer labels.
- Observe that the data for that label from all the Log Producer clusters can be seen.
Label logs from each log producer Cluster
- Now that we have the logs getting aggregated and viewed in the monitoring cluster from each log producer clusters, there is a challenge on how to identify which logs correspond to which cluster on Grafana.
- We all know that each log stream in Loki has unique labels and Promtail has a concept of external labels which can be used to identify logs from each log producer cluster. Refer to this link and look for external labels
- Promtail helm chart provides an extraArgs property that can be used to specify the external labels. Refer: https://github.com/grafana/helm-charts/blob/main/charts/promtail/values.yaml
- Based on the values.yaml file above, we can create a local values.yaml file and specify the external-label like below say for log cluster 1.
extraArgs:
- -client.external-labels=cluster=log-cluster-1
- Then upgrade helm using this values file using command on log cluster 1 :
helm upgrade — install promtail grafana/promtail -f values.yaml
- Now observe grafana Explore options to view the log browser and look for cluster label log-cluster-1. Observe that selecting this label will show the logs for only log-cluster-1 cluster.
- The same can be applied to other log clusters with unique labels.
Conclusion
I am pretty hopeful that people find this article useful for the multiple cluster logging use case. This will help in understanding and exploring the PLG stack as a logging solution. Please put in your comments below with your inputs or questions if any.