MONITORING AZURE VMs USING GRAFANA

Shubham Jain
Globant
Published in
10 min readOct 29, 2020

INTRODUCTION

Grafana is a visualization tool which uses data sources like Azure Monitor, Elasticsearch, CloudWatch etc. for displaying data graphically. This article will concentrate on using Azure Monitor plugin for Monitoring Azure VMs and various features provided by Grafana for actively monitoring Azure VMs.

As a prerequisite, VMs should be connected to an Azure Log analytics Workspace, where all the metric data is being captured under Perf Table

The first part of this article will describe a typical installation, with basic configurations and Azure Monitor as data source.

1- Installation
2- Configuration
3- Datasource

The second part will include key dashboards and how to share them.

4- Dashboards
5- Dashboard Sharing

Last but not least, we will describe advantages and limitations of this tool for this use case.

6- Advantages of Grafana
7- Limitations

SECTION 1

1. Installation

Grafana Provides two versions of the tool:

  1. OSS (Open source) : Functionally identical to the Enterprise version, but you will need to download the Enterprise version if you want Enterprise features
  2. Enterprise : Recommended download. Functionally identical to the open source version, but includes features you can unlock with a license if you so choose.

This article shows installation of the latest Grafana Enterprise edition on Linux VM (Ubuntu 18.04). For installation on Windows, Mac, and other flavors of Linux, please refer the official documentation of Grafana:

https://grafana.com/docs/grafana/latest/installation/

Steps:

1. sudo apt-get install -y apt-transport-https2. sudo apt-get install -y software-properties-common wget3. wget -q -O —  https://packages.grafana.com/gpg.key | sudo apt-key add –4. echo “deb https://packages.grafana.com/enterprise/deb stable main” | sudo tee -a /etc/apt/sources.list.d/grafana.list5. sudo apt-get update6. sudo apt-get install grafana-enterprise7. sudo service grafana-server start8. sudo service grafana-server status9. sudo update-rc.d grafana-server defaults

2. Configuration

Once the installation is complete open the Grafana tool on default port 3000

http://YOUR-IP:3000

Default username and password is admin which needs to be changed at first login.

NOTE: Replace YOUR-IP with the public IP address of the VM where Grafana is installed

Grafana Uses Azure Monitor as a data source to fetch data in real time from Azure services like Azure Monitor, Azure log analytics, Application Insights. For this we need to create a Service Principal and assign below permissions to the SPN:

1. Reader permission at subscription level (if multiple Subscriptions are present, then Reader permission for multiple subscriptions needed). This is needed to read all the resources present in the subscription via Azure Monitor.

2. Log analytics Reader Permission for the subscription to read data from Azure log analytics.

3. Log analytics API permission. (Active Directory -> API Permission -> Add a permission)

Once the SPN with required permissions is created, configure the Azure Monitor Data Source using the below steps:

3. Datasource

1. Open the Grafana Instance on browser and go to Data Sources

2. Click on Add data Source and search for Azure Monitor

3. Fill the details of SPN in the Data Source.

Once the data Source is successfully configured, we can start creating dashboards.

SECTION 2

4. Dashboards

In this section I have created a dashboard using the data source configured in the previous section. It uses log analytics queries to fetch the data from perf table and display in Grafana. To list down the resource groups and VMs in those resource groups, we use a feature called as Variables which we can use in our log analytics queries.

VARIABLES: For creating variables please use below screenshots:

1.Create a variable ResourceGroup to list down all the RGs in the subscription:

Query: ResourceGroups()

2.Create variable VM to list down the VMs in RG:

Query: ResourceNames($ResourceGroup,Microsoft.Compute/virtualMachines)

NOTE: If you have multiple subscriptions, you can create a variable for Subscription as well using type as Datasource and then use that variable in ResourceGroup and VM variables.

Here I had only 1 data source (GrafanaTest) which is displayed in the below screenshot, so I did not create a subscription variable.

METRICS: Now once the variables are ready, we can use them in our log analytics queries for various metric monitoring.

Below are the few metrics which I used for monitoring Azure VMs:

  • CPU Percentage(average): Displays the average CPU percentage for the last 5 mins.

Variable VM is used in the log analytics to display data for all the VMs in the selected RG.

Query:

Perf|where ObjectName ==”Processor” and CounterName ==”% Processor Time”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)|where $__timeFilter(TimeGenerated)|summarize avg(CounterValue) by vm,bin(TimeGenerated,5m)| sort by vm asc , TimeGenerated asc| order by TimeGenerated asc

The scale of X & Y axis can be changed from Axes panel as shown in the below screenshot:

  • Disk Space Used (%): Displays the average disk space used in percentage on the VM for last 5 mins

Query:

Perf| where ObjectName == “LogicalDisk” or ObjectName == “Logical Disk”| where CounterName == “% Used Space”|where InstanceName != “_Total”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)| summarize avg(CounterValue) by vm,bin(TimeGenerated,5m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Disk Space Free (%): Displays the average disk space free in percentage on the VM for last 15 mins

Query:

Perf| where ObjectName == “LogicalDisk” or ObjectName == “Logical Disk”| where CounterName == “% Used Space”|where InstanceName != “_Total”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)| summarize avg(100-CounterValue) by vm,bin(TimeGenerated,15m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Memory Free (%): Displays the free memory (RAM) in percentage on the VM as an average of last 15 mins.

Query:

Perf|where ObjectName ==”Memory”|where CounterName ==”% Committed Bytes in Use” or CounterName == “% Used Memory”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)|summarize avg(100-CounterValue) by vm,bin(TimeGenerated,15m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Memory Used (%): Displays the used memory (RAM) in percentage on the VM as an average of last 15 mins.

Query:

Perf|where ObjectName ==”Memory”|where CounterName ==”% Committed Bytes in Use” or CounterName == “% Used Memory”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)|summarize avg(CounterValue) by vm,bin(TimeGenerated,15m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Disk Writes/sec: Displays the average disk write operations per second for last 15 mins.

Query:

Perf| where ObjectName == “LogicalDisk” or ObjectName == “Logical Disk”|where CounterName == “Disk Writes/sec”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)| summarize avg(CounterValue) by vm,bin(TimeGenerated,15m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Disk Reads/sec: Displays the average disk read operations per second for last 15 mins.

Query:

Perf| where ObjectName == “LogicalDisk” or ObjectName == “Logical Disk”|where CounterName == “Disk Reads/sec”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)| summarize avg(CounterValue) by vm,bin(TimeGenerated,15m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Logical Disk IOPS: Displays the the raw number of input/output disk operations that are performed per second as an average for last 15 mins.

Query:

Perf| where ObjectName == “LogicalDisk” or ObjectName == “Logical Disk”|where CounterName == “Disk Transfers/sec”| where InstanceName != “_Total”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)| summarize avg(CounterValue) by vm,bin(TimeGenerated,15m)| sort by vm asc,TimeGenerated asc| order by TimeGenerated asc
  • Network In: Displays the average number of bytes received on all network interfaces by the virtual machine for last 15 mins (Incoming Traffic).

Query:

Perf| where ObjectName == “Network Adapter”|where CounterName == “Bytes Received/sec”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)|extend vm_new= strcat(vm, “.Bytes Received/sec”)| summarize avg(CounterValue) by vm_new,bin(TimeGenerated,15m)| sort by vm_new asc,TimeGenerated asc| order by TimeGenerated asc
  • Network Out: Displays the average number of bytes out on all network interfaces of by the virtual machine for last 15 mins (Outgoing Traffic).

Query:

Perf| where ObjectName == “Network Adapter”|where CounterName == “Bytes Sent/sec”|extend vm=iif(Computer has “.”,substring(Computer,0,indexof(Computer, ‘.’)),Computer)|where (tolower(strcat(“””,$VM,”””)) has tolower(vm)) and (Computer has vm)| where $__timeFilter(TimeGenerated)|extend vm_new= strcat(vm, “.Bytes Sent/sec”)| summarize avg(CounterValue) by vm_new,bin(TimeGenerated,15m)| sort by vm_new asc,TimeGenerated asc| order by TimeGenerated asc

5. Dashboard Sharing

Grafana provides three ways of sharing the dashboard:

Link: You can generate a link for the dashboard and share it to the users who have access to Grafana.

Snapshot: You can generate a local snapshot to generate a link which does not need the user to have access to Grafana. In this link, user won’t be able to change the variable values and will be able to see the data at the point when it was shared

Snapshot dashboard:

Export: The dashboards can be saved to json files and shared with others. Grafana provides an option to import dashboard using json files. This way Grafana dashboards can be saved as json files in the source control repository as well.

The saved dashboards in json format can then be imported into grafana using the steps shown in below screenshots:

SECTION 3

6. Advantages of Grafana

1.In comparison to Azure native monitoring, Grafana provides an option to plot multiple VM metrics on a single dashboard with the help of variables.

2.Grafana supports multiple clouds like AWS, Azure, GCP etc. So, resources from different clouds can be monitored on a single dashboard.

3.Grafana provides better visualization of metrics. It has various types of panels such as graph, bar gauge, table, text etc.

4.Provides features such as dashboard linking which can be utilized for linking various dashboards for more deep and enhanced monitoring.

5.Sharing of dashboards is easy.

6.Grafana supports alerting.

7. Grafana does not store any data , it fetches the real time data using Rest API calls.

7. Limitations

1.No time series storage support. Grafana is only a visualization solution. Time series storage is not part of its core functionality whereas tools like Prometheus stores time series by using key-value tagging along the time series to better organize the data and offer strong query capabilities.

2. No data collection support. Neither time series storage, nor time series gathering are part of its core functionality.

3.Alert Management is not very mature and not part of its core functionality.

CONCLUSION

Grafana provides a big set of plugins for integration with multiple cloud providers and other tools like Elasticsearch ,Graphite, Prometheus etc. Grafana also provides Reporting capabilities but it requires an Enterprise license. Grafana does not store any data which is a limitation but also good from a security point of view as it fetches real time data using REST API calls. So if you are looking for a centralized tool for monitoring of resources in multi-cloud environments or integration with other tools to improve visualization, Grafana is a good choice.

--

--