Grafana Dashboards for Azure Kubernetes Pod Level Metrics with Azure Monitor and Application Insights

Kshitij Sharma
Microsoft Azure
Published in
4 min readJul 5, 2021

The Problem Statement

Azure Monitor has integration with Grafana through the Azure Monitor Datasource Plugin. After adding the plugin, we can query Logs, Application Insights, and Metrics from Azure Monitor in Grafana.

Azure Monitor for Containers sources collection of metrics as part of monitoring an AKS cluster. Out of the box, we can get dashboards for Node Level and Namespace level metrics in Grafana. But the same is not available for Pod metrics.

This article will cover visualizing/plotting Pod Metrics like CPU and memory in Grafana using Kusto Queries.

Ready to use Grafana Dashboard for AKS Pod Metrics: https://grafana.com/grafana/dashboards/14891

The Solution

Prerequisites:

For this example, let’s try to visualize the CPU usage at the pod level in Grafana Dashboard for a cluster running a service with 2 pods.

We will start with building the Log Analytics workspace in the Azure Portal and then proceed to Grafana.

We see the following tables in the Container Insights section on selecting the Logs blade in the Log Analytics workspace.

On evaluating each of the tables, we realize that the KubePodInventory table contains contextual information about the state of pods and services.

This is a good start to understand the cluster state from a pod and containers perspective.

Columns in the KubePodInventroy Table

Now we need to look for performance metrics related to pods. Under the Log Management section in the Log Analytics workspace, we observe a table Perf.

Inside the Perf Table, we have fields in the form of counters with performance metrics related to all kinds of azure resources ( VM, Container, etc.). The column name is CounterName which corresponds to the different performance metrics, and the respective value is in the CounterValue column.

The possible values for CounterName are as follows :

cpuRequestNanoCores
memoryRequestBytes
cpuLimitNanoCores
memoryWorkingSetBytes
restartTimeEpoch
cpuUsageNanoCores
memoryRssBytes

But the Perf Table doesn't have contextual information about Kubernetes cluster states like a pod, container, services, etc.

Let's try to apply a JOIN between the Perf and KubePodInventory tables and build a query. But the question arises, on which column should we build the JOIN?

On deeper inspection of data in the Perfs and KubePodInventory, we realize the InstanceName column value is in the Perf table is in the following string format :

ClusterId / PodUid / ContainerName

Luckily for us, all the above fields are available in the KubePodInventory table. Thus we can JOIN on InstanceName Column.

In the KubePodInventory table, we have the following columns which are useful for us :

  • ClusterName
  • ContainerName
  • ClusterId
  • PodStatus
  • PodLabel
  • Namespace

In the Perf Table, we have the following columns which are useful to us :

  • CounterName
  • CounterValue
  • InstanceName
  • ObjectName

Thus our query in Log analytics looks like this:

In the query above, the following is happening :

  • We are filtering based upon ClusterName, PodStatus, Namespace and eliminating unscheduled pods.
  • A GUID appends the ContainerName field in the KubePodInventory Table. We are splitting it out to get the container name of the application.
  • We are identifying our application pod based upon pod labels as part of our pod definition.
  • Forming the InstanceName column by concatenating ClusterId,PodUid and the ActualContainerName
  • In the Perf table, we are only filtering for Kubernetes Container logs and metrics.
  • Applying the join on InstanceName between the Perf and KubePodInventory Table.
Result in Log Analytics Workspace

Unweaving the Magic in Grafana

In the query above, the render timechart keywords help us to visualize and render the data in the form of a chart in Azure. However, the same does not work directly in Grafana as the plugin extensions are different for rendering.

Thus we need to do some additional tweaks in the query to get it working in Grafana. Here is the updated query :

We summarize all the values in TimeSeries Format as Grafana understands all data in that format itself and then renders it. Granularity is set for every 5 minutes for the query.

We also convert from CPU nanoCores to milliCores which is the default implementation of dashboarding CPU level metrics in Azure Monitor for containers.

Add the above query in Panel in Grafana and selection Logs in the Service Field and save the panel.

Thus we now can plot pod CPU usage in Grafana.

Final Result

Similarly, the query for plotting memory usage metric for pods in AKS cluster is as follows :

Conclusion

We can now plot pod and container-related performance metrics related to the AKS cluster using Log Analytics and Azure Monitor in Grafana.

Grafana Dashboard for AKS Pod Metrics

--

--