Using the Azure Monitor Metrics API to create custom metrics

Published in

ASOS Tech Blog

8 min readMay 9, 2019

Azure Monitor is the unified experience in Microsoft Azure for the collection, analysis and monitoring of logs and metrics from a number of sources. Azure Monitor is a managed service for working with large scale telemetry data, giving you the capability to create alerting and automated responses based on metric thresholds and other calculations.

The teams at Microsoft work closely with Azure Monitor and make metrics from its particular products available as platform metrics. Need to know how many request units a Cosmos instance is using? Or how many dead letters you have on a Service Bus topic? No problem, there are metrics available that you can visualise and alert on, with minimal effort from your engineering team.

More metrics are being made available within Azure on a regular basis, a list of those currently available can be seen here. With the release of the Azure Monitor plugin for Grafana, you can now achieve a best-in-class graphing experience over almost real time, managed metrics.

But what if a metric you require isn’t available as part of platform metrics? Perhaps it’s something specific to your business domain, or something that the product team hasn’t (yet) made available for your use-case. The Metrics API supports a REST endpoint that allows you to store custom metrics alongside platform metrics, which means your data is stored at the resource level and can be managed as per the standard metrics for the resource.

At ASOS, monitoring and alerting is a vital part of our operation. Any downtime that impacts our ability to take customer orders must be avoided, we therefore require metrics and logs from event producing systems with a low level of end-to-end latency. This is so we can understand when systems are about to go wrong and take proactive action. Gathering telemetry from a large number of distributed micro-services, running at high scale, and turning that into something that can be acted upon quickly is a big-data problem. We aim to take advantage of the products Microsoft make available to manage the collection and storage of this data, allowing us to perform the analysis and focus on our business requirements. The Monitor Metrics API helps us achieve this, as it gives us the ability to store time-series data at high precision and low latency.

This article will show you how to create an Azure function that will make use of the Azure Monitor API to create a custom metric against a service bus namespace. The source code for this article is available in GitHub.

dylan-asos/netcore-azmon-custommetrics

Demo project showing the Azure Monitor API and a custom metric - dylan-asos/netcore-azmon-custommetrics

github.com

Defining the custom metric

Consider an application that makes heavy use of Azure Service Bus (ASB), with a number of queues, topics and dependent systems subscribing to information. We want to set up alerts to make sure that our capacity limits aren’t being reached on queues and topics. The platform metrics made available from ASB allow us a view of Size, which is the total size of items currently in the entity. Without knowing the maximum size allowed, this doesn’t enable us to raise a meaningful alert without manually configuring alert thresholds.

A better metric for this requirement is a calculation that gives ‘Percentage of Capacity Used’ at the entity level. With a percentage-based metric, an alert can be triggered when any entity is reaching a threshold, e.g 85% capacity, without requiring any knowledge about the underlying maximum storage size.

We want to alert when something is running out of space so we can be proactive. Using Action Groups and Logic apps, there are opportunities to run self-healing operations on certain signals, before a capacity problem becomes critical.

The solution

We’ll deploy a function application into a subscription that will make use of the Azure Management API (Microsoft.Azure.Management.Fluent) to retrieve details about service bus namespaces in that subscription. The function will enumerate the topics and queues to retrieve the maximum sizes allowed for each entity and what the current size is, then calculate the percentage of capacity used, before finally storing the metric data alongside the service bus namespace by making a REST call to the Monitor API.

Security requirements

To enumerate a subscription and create custom metrics, you’ll need a service principal (SPN) within your tenant that has both Reader and Monitoring Metrics Publisher permissions to the subscription. At run-time, the function will authenticate as the SPN and use both the Azure Management API and the Monitoring API in the context of the principal.

We’ll therefore need to obtain two access tokens for working with the different Azure resources (one for each API) with the resource/audience correctly scoped.

To run the project locally, make sure you have set the values in local.settings.json for the SPN, your Azure tenant and subscription details.

Azure functions project

The solution is a V2 functions project in Net Standard 2.0 that makes use of the Willezone nuget package for dependency injection. It does the job nicely while the Functions team finish making DI a native experience.

BorisWilhelms/azure-function-dependency-injection

Dependency Injection for Azure Functions v2. Contribute to BorisWilhelms/azure-function-dependency-injection…

github.com

The startup classes in the Application directory wire up the required dependencies, which are injected into the function using the [Inject] attribute.

The function

There’s a timer triggered function (ServiceBusPercentageUsed) that will execute every five minutes. An instance of ServiceBusPercentageUsedMetricsGenerator is injected and the single method that class exposes is executed – that’s the only thing done in the function entry point itself.

As the signature for Run is static, I find it preferable to delegate all the work the function will perform to a container created instance, allowing us to take advantage of DI and instance level variables within the class that does the work. This should lead to a cleaner overall design.

The metrics generator class takes all required dependencies from the container and begins enumerating the service bus namespaces in the subscription, kicking off a number of tasks that process each namespace.

The work done in ProcessServiceBusNamespace enumerates the queues and topics, looking at the MaxSizeInMB and CurrentSizeInBytes properties on the entities and calculating the current percentage of capacity consumed by using the values. Each calculation is then added to the series for the payload we will transmit to the Azure Monitor metrics API.

Azure Monitor schema

To monitor API schema supports a dimension names and series values section . The payload being generated will look like the following.

As the queues and topics are enumerated, the payload is built using the Model objects that represent the schema.

There will be one dimension name (EntityName), and many series for the topics and queues – processing all the items in a service bus namespace should result in one call to the Monitor API.

Transmitting the data

To send a custom metric to the API, a URL in the format https://{azureRegion}.monitoring.azure.com/{azureResourceId}/metrics must be built. The metrics transmitted for a resource are sent to the monitoring ingress endpoint in the same region as the resource. A list of the supported regions can be found here.

azureRegion – this will be the region of the service bus namespace, which can be retrieved from the Region.Name property of IServiceBusNamespace.

azureResourceId – the unique resource identifier, again available on the IServiceBusNamespace object as Id.

A class named AzureMonitorMetricsClient exposes one method, which takes the region, resource Id and the payload. It’s responsible for constructing the request to send to the API, which requires the correct endpoint to transmit to and appropriate authentication context.

To authenticate with the Monitor API, we need an access token that has been scoped for https://monitoring.azure.com/. MetricsApiAccessTokenProvider provides this functionality for us as a simple wrapper around the ADAL library, which takes care of token management for us.

We can then send the data and the Azure Monitor API will ingress the information and take care of creating the metric name for the resource.

Viewing the data

Once you have transmitted some data, you can view the metrics in the portal alongside the platform metrics for a resource.

In the resource blade, choose the Metrics option under the Monitoring section

In the main screen, choose the namespace for your custom metrics and the metric you want to view

As the metric created is multi-dimensional, there’s an option to split the data by the dimension name, which was set to EntityName

Finally, you can view the results, which will be the percentage of capacity used, split by queue and topic name

When viewing the graph over smaller time windows, e.g. one hour, solid lines indicate times where data points have been received, dotted lines represent periods where no data exists. As the function is on a five-minute interval, you’ll only expect to see data points at that frequency, but could obviously lower this value if required. Storing metrics at one-minute precision is supported in Azure Monitor.

Alerting on the metric

Now we’re capturing the metric, alerting on it is as simple as setting up a standard Azure Monitor alert – a full walk through is available here.

When choosing Signal Source, the metric displays against the namespace we gave it when creating the payload, which was ‘ASOS Custom Metrics’

If you end up automating alert creation using an ARM template, you must specify the namespace for custom metric alerts, like so.

Deploying the function

The last thing to do is deploy the function into the subscription it will be calculating metrics for. It will then execute as per the timer interval, calculate metrics for all service bus namespaces in the subscription and store them in Azure as custom metrics for the resource.

About me

I’m Dylan Morley, one of the Principal Software Engineers at ASOS. I primarily work on the back-end commerce APIs that enable our shopping experience.