Deploying Azure Managed Prometheus with AzAPI

Published in

Microsoft Azure

6 min readDec 22, 2022

Azure Managed Prometheus is a brand-new managed service offering on Microsoft’s cloud platform Azure. It is currently in preview and expected to become generally available later this year. It allows you to collect and analyze metrics at scale from Azure Kubernetes Service or any other Kubernetes cluster running self-managed Prometheus. It is based on the well known open source project Prometheus.

Prometheus is an open-source monitoring solution that enables you to collect metrics from various systems, applications, and services. It allows you to query and visualize your data, alert on critical conditions, and understand the performance and behavior of your infrastructure. It is built with a focus on scalability, reliability, and simplicity, making it a popular choice for monitoring large and complex systems.

In this article we’re taking a closer look on how to deploy the new service via the Terraform AzAPI provider.

Terraform’s AzureRM provider is a powerful solution to deploy and manage resources on Microsoft’s cloud platform Azure. But especially power users experience from time to time situations where the AzureRM provider does not or not yet support specific functionality, properties or services. This is especially the case when it comes to preview features and services. This is where the AzAPI provider comes to play to fill in these gaps.

Setup

To start with, we first of all need the Azure Monitor workspace, a container for data collected by Azure Monitor:

# resource group for our prometheus resources
resource "azurerm_resource_group" "stamp" {
  name     = "promtest"
  location = "uksouth"
}

# azure monitor workspace for prometheus
resource "azapi_resource" "prometheus" {
  type      = "microsoft.monitor/accounts@2021-06-03-preview"
  name      = "promtest"
  parent_id = azurerm_resource_group.stamp.id
  location  = azurerm_resource_group.stamp.location

  response_export_values = ["*"]
}

This results in a a new Azure Monitor workspace:

Azure Monitor workspace in the Azure portal

The next resource we need is a Data Collection Endpoint:

# data collection endpoint
resource "azapi_resource" "dataCollectionEndpoint" {
  type      = "Microsoft.Insights/dataCollectionEndpoints@2021-09-01-preview"
  name      = "MSProm-SUK-${azurerm_kubernetes_cluster.stamp.name}"
  parent_id = azurerm_resource_group.stamp.id
  location  = azurerm_resource_group.stamp.location

  body = jsonencode({
    kind       = "Linux"
    properties = {}
  })
}

The naming schema used by the portal for all supporting resources like the Data Collection Endpoint and Rules is: MSProm-<region>-<cluster>. While MSProm is static, it replaces region with the short version of the Azure region and cluster with the name of the AKS cluster. The resource itself is deployed into the same resource group and location.

The naming schema used by the Azure portal experience is not fixed, when deploying via ARM (via for example Terraform or others) you can modify these names to fit your needs and to, for example, follow your own naming schema. Important to note though is that when deleting the service via the Azure Portal it will look for these specific names.

Then we need a Data Collection Rule (DCR) for our previously created Data Collection Endpoint (DCE):

resource "azapi_resource" "dataCollectionRule" {
  schema_validation_enabled = false

  type      = "Microsoft.Insights/dataCollectionRules@2021-09-01-preview"
  name      = "MSProm-SUK-${azurerm_kubernetes_cluster.stamp.name}"
  parent_id = azurerm_resource_group.stamp.id
  location  = azurerm_resource_group.stamp.location

  body = jsonencode({
    kind = "Linux"
    properties = {
      dataCollectionEndpointId = azapi_resource.dataCollectionEndpoint.id
      dataFlows = [
        {
          destinations = ["MonitoringAccount1"]
          streams      = ["Microsoft-PrometheusMetrics"]
        }
      ]
      dataSources = {
        prometheusForwarder = [
          {
            name               = "PrometheusDataSource"
            streams            = ["Microsoft-PrometheusMetrics"]
            labelIncludeFilter = {}
          }
        ]
      }
      destinations = {
        monitoringAccounts = [
          {
            accountResourceId = data.azapi_resource.prometheus.id
            name              = "MonitoringAccount1"
          }
        ]
      }
    }
  })
}

Here we’re refering to multiple other resources. At the top we’re constructing the name again following the portal naming schema MSProm-<region>-<cluster>. Followed by the dataCollectionEndpointId, which is the resource id of the DCE. At the bottom we’re adding a monitoring account, by refering to the resource id of our Azure Monitor workspace.

So far we’ve deployed the following three resources:

Azure Monitor workspace
Data Collection Endpoint
Data Collection Rule

Next step is to associate our AKS cluster with this Azure Monitor workspace using the Data Collection Endpoint and Rule:

resource "azapi_resource" "dataCollectionRuleAssociation" {
  schema_validation_enabled = false
  type                      = "Microsoft.Insights/dataCollectionRuleAssociations@2021-09-01-preview"
  name                      = "MSProm-SUK-${azurerm_kubernetes_cluster.stamp.name}"
  parent_id                 = azurerm_kubernetes_cluster.stamp.id

  body = jsonencode({
    scope = azurerm_kubernetes_cluster.stamp.id
    properties = {
      dataCollectionRuleId = azapi_resource.dataCollectionRule.id
    }
  })
}

This associates our AKS cluster with our DCR. One pre-requisite to make that work is to have monitor_metrics {}set in our azurerm_kubernetes_clusterdefinition.

To sum up the various components we’ve used so far, Azure Monitor workspace is where data collected by Azure Monitor is stored, data collection rules determine what data to collect, data collection endpoints are where the collected data is sent for analysis and storage, and data collection rule associations are the link between Azure Monitor data collection rules and the resources they apply to.

Verify

To verify that your deployment is working as expected, go to Azure Monitor workspaces in the Azure portal and click on “Monitored clusters” under “Managed Prometheus.” The cluster defined in our previously created data collection rule association should show up here.

Monitored clusters in Azure Monitor workspace

Once the data collection rule is associated with the resources, Azure Monitor starts collecting the specified metrics, logs, and events from those resources and sending it to the configured data collection endpoints.

The dashboard should show you all clusters associated to this Azure Monitor workspace. The other way arround is via the AKS cluster. Go to “Kubernetes service” in the Azure Portal, select your cluster and click on “Insights” in the “Monitoring” section. There you’ll find a button “Monitor settings” which will also show you if this cluster is onboarded to Azure Monitor managed service for Prometheus:

“Monitor Settings” in AKS Container Insights

This is already it. You now have an Azure Monitor workspace up and running that is collecting Prometheus metrics and providing a query API. If you’re looking for a more convinient way to access the data, you can link a Grafana instance to your Azure Monitor workspace. This can be a managed or an unmanaged version of Grafana. The Azure portal experience though only allows you to link to a managed Grafana instance.

Rules and rule groups

The last remaining piece we haven’t covered yet are “Rule groups”. Rules in Prometheus act on data as it’s collected. They run sequentially in the order they’re defined in the group. Prometheus knows two kinds of rules, Alert rules that let your create an Azure Monitor alert based on the results of a Prometheus (PromQL) query and Recording rules that allow you to pre-compute frequently needed or compute-intensive expressions and show their result as a new set of time series. Some of the dashboards use pre-computed metrics. These rules and rule groups can be deployed via AzApi as well. Here’s an example:

resource "azapi_resource" "prometheusRuleGroup" {
  type      = "Microsoft.AlertsManagement/prometheusRuleGroups@2021-07-22-preview"
  name      = "${local.prefix}-${local.location_short}-ruleGroup"
  parent_id = azurerm_resource_group.stamp.id
  location  = azurerm_resource_group.stamp.location

  body = jsonencode({
    properties = {
      description = "Prometheus Rule Group"
      scopes      = [data.azapi_resource.prometheus.id]
      enabled     = true
      clusterName = azurerm_kubernetes_cluster.stamp.name
      interval    = "PT1M"

      rules = [
        {
          record = "instance:node_cpu_utilisation:rate5m"
          expression = "1 - avg without (cpu) (sum without (mode)(rate(node_cpu_seconds_total{job=\"node\", mode=~\"idle|iowait|steal\"}[5m])))"
          labels = {
              workload_type = "job"
          }
          enabled = true
        },
        {
          record = "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate"
          expression = "sum by (cluster, namespace, pod, container) (  irate(container_cpu_usage_seconds_total{job=\"cadvisor\", image!=\"\"}[5m])) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (  1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=\"\"}))"
          labels = {
            workload_type = "job"
          }
          enabled = true
        }
      ]
    }
  })
}

What’s left? I of course hope that the resource will find its way into the AzureRM provider at some point to make it a little smoother to onboard.

Besides that is this a great solution to use Prometheus in your Azure environment without the struggle to deploy, operate and maintain your own self-hosted Prometheus infrastructure.

Deploying Azure Managed Prometheus with AzAPI

Setup

Verify

Rules and rule groups

Written by Heyko Oelrichs