Azure Monitor managed service for Prometheus — Overview

Vishwanath
Microsoft Azure
Published in
17 min readDec 5, 2022

Microsoft recently announced ‘Azure Monitor managed service for Prometheus’. Below article will help readers understand the full offering, how it integrates with AKS (Azure Kubernetes service). Throughout this article, there are pointers to Azure documentation, which will help you understand different parts of the service.

All the links used in this article are also listed at the bottom of the article in the references section, so it can be easy to get to them without searching through the article.

Azure Monitor managed service for Prometheus (also known as Managed Prometheus in Azure) is a fully managed Prometheus service in Azure (it includes a fully managed data store, query service, ruler service, Prometheus compatible query APIs, and managed add-on for data collection from AKS.). As a first step, this service is fully integrated with Azure Kubernetes Service (AKS) as an add-on which scrapes Prometheus metrics.

This service is in preview. We greatly appreciate any feedback for improvements and usability. Please see end of this article for ways to provide feedback.

Managed Prometheus in Azure

Below is a brief overview of Managed Prometheus in Azure. (With green/bottom part of the diagram below showing its different components)

Azure Monitor managed service for Prometheus — Overview
  • Store — Prometheus data in Azure is stored in an ‘Azure Monitor Workspace’ (referred as AMW for the rest of this article for brevity), which is a managed store. Prometheus data stored in AMW is query-able using Grafana through PromQL queries with a Prometheus data source querying that AMW. You can also create and run recording rules, and PromQL based alert rules in AMW, both of which are evaluated in the cloud for any given AKS cluster ingesting Prometheus metrics into a specific AMW.

AMW is in preview, so data ingested into it & queried from it are metered, but not billed yet (as of writing this article).

  • Data collection Agent — In addition to offering a managed store, Microsoft has also announced a new managed Kubernetes add-on (azuremonitormetrics) that acts as the managed data collection agent for scraping Prometheus metrics (which saves you from running Prometheus server) for your AKS cluster(s). This add-on agent runs on your AKS cluster agent nodes and it is managed by Microsoft, once you enable the add-on.
  • Alerts & Recording rules — PromQL based rules (recording & alerting rules) can be configured targeting AMW, and they are evaluated in the cloud. Fired alerts also integrate with the existing Azure alerting pipeline and leverage existing options to notify/alert on top of the new Prometheus metrics in Azure.
  • Managed Grafana — There is also the Azure Managed Grafana service, which can be used to query/chart Prometheus metrics stored in AMW.

All of the above gives a fully managed Prometheus monitoring stack in Azure (AKS — agent/data collector add-on — ingestion — store — query —recording rules — alerts —Grafana), for monitoring AKS clusters.

The metrics store backing AMW is a highly scalable time-series store that is already widely used in Azure. Azure made a choice to use the same store for storing & querying Prometheus time-series, as part of its managed Prometheus service.

Azure Monitor container insights

If you already use Azure Monitor for monitoring your AKS clusters, you might be familiar Container Insights. It is a log-based monitoring solution, which gets enabled when you enable ‘monitoring’ add-on in AKS clusters. Now, in addition to that, you can also enable our new add-on (azure-monitor-metrics) which will act as managed add-on/data-collector for scraping & ingesting Prometheus time-series from your AKS cluster(s) as part of its integration with Azure Monitor managed service for Prometheus. Now you can choose to Managed Prometheus monitoring, in addition to log based monitoring.

Azure Monitor container insights (logs) is the primary log collection solution for AKS to collect stdout/stderr logs. It also has the capability to scrape & collect Prometheus metrics. It will continue to have Prometheus ‘integration’ to scrape and ship Prometheus metrics as ‘logs’ to the configured Azure Log analytics workspace (qyery-able using KQL language and not PromQL). The scope of container insights will grow to enable collecting logs and/or Prometheus metrics and leverage the new Managed Prometheus offering. Also note, that these two add-ons (logs, metrics) do not interfere or share scrape configurations between them. They are independent, and you can choose to enable either/both as part of ‘insights’ experience in the Azure Portal Ux.

Ingesting Prometheus metrics into AMW

There are 2 ways to ingest Prometheus metrics into AMW.

  1. Agent based — fully managed by Microsoft for customers: Using azure-monitor-metrics add-on (available only in AKS at the time of writing this article) — See more about this in documentation
  2. Remote write based— self managed data collection by the customers using OSS Prometheus: Using a sidecar (that does authentication for ingesting into AMW), by configuring Prometheus remote-write (PRW) on a Prometheus server. This can be done from any Kubernetes environment (AKS/non-AKS etc..). This is currently possible only in Kubernetes environment (and not in Virtual Machines), due to the requirement of side-car container needed next to the Prometheus container— See more about this in documentation

We will now go more details about metrics-addon data collection, configuration, recording rules, alerts etc.…

Agent based data collection using the azure-monitor-metrics add-on

We will dive deeper into agent based fully managed Prometheus data collection in this section.

When you enable Prometheus monitoring thru azure-monitor-metrics add-on (from now on called metrics add-on) for an AKS cluster (thru Azure portal UX or CLI or ARM templates) [Documentation], by default the following are auto-configured for you -

  1. Default Prometheus targets to scrape inside the AKS cluster— Prometheus scrape config for discovering and scraping 4 default targets are automatically configured for you by metrics add-on (more on this later in this article)
  2. Default dashboards — 9 default dashboards from Kubernetes & node mixins are configured for you in the Azure Managed Grafana instance you chose during metrics add-on enablement
  3. Default recording rules — 30 default recording rules are configured for you for the specific cluster in the specified AMW chosen during monitoring enablement. These rules are consumed by the default dashboard queries
  4. Default alerts — 17 default alerts from Kubernetes mixin — At present, this step is manual, and alerts are not auto configured. There is a published ARM template that one can import to enable these alerts for each of your cluster sending data to AMW — Link to template in GitHub

Enabling azure-monitor-metrics add-on

There are several ways to enable metrics add-on in your AKS cluster.

  1. Azure CLI (az aks …) — See documentation here
  2. Azure Portal Ux — You can enable metrics add-on thru ‘Insights’ menu for your AKS cluster — (see screen-1 below) (or) from the AMW Ux by selecting a cluster from ‘Monitored clusters menu — (see screen-2 below)
Screen-1 — AKS Cluster’s Insights menu
Screen-2 — AMW Monitored clusters menu

3. ARM templates — See here for templates that you can use readily

At present, metrics-add-on can only be enabled after provisioning your AKS cluster (and not during AKS cluster provisioning/creation).

What monitoring arti-facts are auto provisioned as part of metrics add-on enablement?

When you enable metrics add-on for your AKS cluster(s), you will be asked to provide an existing AMW and also an existing Azure Managed Grafana instance.

  • If you don’t provide an existing AMW, metrics-add-on will create a default AMW (named defaultazuremonitorworkspace-<region> in the same Azure resource group as your AKS cluster) in the closest available AMW Azure region to your AKS cluster. Metrics add-on also deploys & auto-configures default recording rules (30) (used by default dashboards) for your cluster in that specific AMW.
  • Metrics-add-on uses the provided Azure Managed Grafana instance to ‘link’ your AKS cluster to the specified AMW by provisioning below dashboard artifacts in it —

a) Kubernetes mixin default dashboards (9) for visualizing the metrics collected by default — You will see these dashboards under a folder named ` `Managed Prometheus` in your linked Grafana instance

b) Prometheus data source (with name Managed_Prometheus_<AMWName>) that you can use with the default dashboards to query/see your data.

In addition to the above explicit resources related to the Azure Managed Grafana instance you selected, the below are auto-provisioned by the metrics add-on for the selected AKS cluster -

Metrics add-on agent topology

When you enable metrics add-on for your AKS cluster, it will deploy the Prometheus data collection agent on the cluster, which primarily consists of the pods from below -

daemon set — This gets deployed in every node in the cluster. This agent scrapes some of the node-local targets (like kubelet, cAdvisor etc..) in every node in the cluster. We will go thru more details on this, later in this article.

replica set — This is a ‘cluster’ agent (meaning singleton replica that runs on any one node in the cluster). This agent scrapes some of the cluster-wide targets (like coredns service, kube-state-metrics etc…) in the cluster. We will go through more details on this later in this article as well.

kube-state-metrics replica — This is an opensource component that derives metrics from the state of Kubernetes objects in a Kubernetes cluster. This component is also auto installed by the metrics add-on and managed (upgrade, patching etc.…) by Microsoft as part of managing the add-on.

node-exporter (Linux) — This is an opensource component that exposes node metrics for every node in an AKS cluster. This component runs on every AKS node and also managed by Microsoft (as part of AKS agent VM image, as this is running on AKS agent nodes). Metrics add-on doesn’t provision this on the node, but it scrapes this target that's already available in every AKS (Linux) node.

All the above components run in the `kube-system` namespace in the cluster.

Default scraped Prometheus ‘targets’

By default (meaning without requiring any additional configuration from users), metrics add-on discovers and scrapes the following targets from the cluster. Scraping frequency for all default targets is 30s. See here on how to change the scrape frequency for default target scrapes.

  1. cadvisor (in every node)
  2. kubelet (in every node)
  3. node exporter (in every node)
  4. kube-state-metrics

In Prometheus, any endpoint that hosts/exposes Prometheus metrics is called a ‘target’ for Prometheus to scrape and collect metrics from it.

Metrics/time-series ingestion volume

One of the primary things to consider when collecting Prometheus metrics is how to control the ingestion volume to collect just the metrics that you use.

Metrics add-on by default, only collects metrics from the 4 default targets explained above in the default targets section. In doing so it also allow-lists/filters-in only metrics that are used by the default dashboards, recording rules (and alerts) that are configured by the add-on. This default ingestion profile used by the addon is called `minimalingestionprofile` and it is turned ON by default when the add-on is enabled. One can turn OFF the minimalingestionprofile, which will cause the add-on to collect all the metrics from the 4 default targets. This will cause significant increase in ingestion volume (and hence ingestion/storage cost). You can read more about controlling the default metric allow/keep lists & targets using a config-map in the addon settings section below in this article.

Metrics collected by the metrics add-on by default

By default, (with minimalingestionprofile=true) the following metrics are allow-listed for the below 4 default targets.

cadvisor (job=cadvisor)

container_memory_rss
container_network_receive_bytes_total
container_network_transmit_bytes_total
container_network_receive_packets_total
container_network_transmit_packets_total
container_network_receive_packets_dropped_total
container_network_transmit_packets_dropped_total
container_fs_reads_total
container_fs_writes_total
container_fs_reads_bytes_total
container_fs_writes_bytes_total
container_cpu_usage_seconds_total

kubelet (job=kubelet)

kubelet_node_name
kubelet_running_pods
kubelet_running_pod_count
kubelet_running_sum_containers
kubelet_running_container_count
volume_manager_total_volumes
kubelet_node_config_error
kubelet_runtime_operations_total
kubelet_runtime_operations_errors_total
kubelet_runtime_operations_duration_seconds_bucket
kubelet_runtime_operations_duration_seconds_sum
kubelet_runtime_operations_duration_seconds_count
kubelet_pod_start_duration_seconds_bucket
kubelet_pod_start_duration_seconds_sum
kubelet_pod_start_duration_seconds_count
kubelet_pod_worker_duration_seconds_bucket
kubelet_pod_worker_duration_seconds_sum
kubelet_pod_worker_duration_seconds_count
storage_operation_duration_seconds_bucket
storage_operation_duration_seconds_sum
storage_operation_duration_seconds_count
storage_operation_errors_total
kubelet_cgroup_manager_duration_seconds_bucket
kubelet_cgroup_manager_duration_seconds_sum
kubelet_cgroup_manager_duration_seconds_count
kubelet_pleg_relist_interval_seconds_bucket
kubelet_pleg_relist_interval_seconds_count
kubelet_pleg_relist_interval_seconds_sum
kubelet_pleg_relist_duration_seconds_bucket
kubelet_pleg_relist_duration_seconds_count
kubelet_pleg_relist_duration_seconds_sum
rest_client_requests_total
rest_client_request_duration_seconds_bucket
rest_client_request_duration_seconds_sum
rest_client_request_duration_seconds_count
process_resident_memory_bytes
process_cpu_seconds_total
go_goroutines
kubernetes_build_info

nodexporter (job=node)

node_memory_MemTotal_bytes
node_cpu_seconds_total
node_memory_MemAvailable_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
node_memory_MemFree_bytes
node_memory_Slab_bytes
node_filesystem_avail_bytes
node_filesystem_size_bytes
node_time_seconds
node_exporter_build_info
node_load1
node_vmstat_pgmajfault
node_network_receive_bytes_total
node_network_transmit_bytes_total
node_network_receive_drop_total
node_network_transmit_drop_total
node_disk_io_time_seconds_total
node_disk_io_time_weighted_seconds_total
node_load5
node_load15
node_disk_read_bytes_total
node_disk_written_bytes_total
node_uname_info

kube-state-metrics (job=kube-state-metrics)

kube_node_status_allocatable
kube_pod_owner
kube_pod_container_resource_requests
kube_pod_status_phase
kube_pod_container_resource_limits
kube_pod_info|kube_replicaset_owner
kube_resourcequota
kube_namespace_status_phase
kube_node_status_capacity
kube_node_info
kube_pod_info
kube_deployment_spec_replicas
kube_deployment_status_replicas_available
kube_deployment_status_replicas_updated
kube_statefulset_status_replicas_ready
kube_statefulset_status_replicas
kube_statefulset_status_replicas_updated
kube_job_status_start_time
kube_job_status_active
kube_job_failed
kube_horizontalpodautoscaler_status_desired_replicas
kube_horizontalpodautoscaler_status_current_replicas
kube_horizontalpodautoscaler_spec_min_replicas
kube_horizontalpodautoscaler_spec_max_replicas
kubernetes_build_info
kube_node_status_condition
kube_node_spec_taint

In addition to the above, the below metrics are allow-listed by default for the below targets, but these targets are NOT scraped by default, unless you turn these targets ON as specified in the scrape configuration section below.

kube-proxy (job=kube-proxy)

kubeproxy_sync_proxy_rules_duration_seconds
kubeproxy_sync_proxy_rules_duration_seconds_bucket
kubeproxy_sync_proxy_rules_duration_seconds_sum
kubeproxy_sync_proxy_rules_duration_seconds_count
kubeproxy_network_programming_duration_seconds
kubeproxy_network_programming_duration_seconds_bucket
kubeproxy_network_programming_duration_seconds_sum
kubeproxy_network_programming_duration_seconds_count
rest_client_requests_total
rest_client_request_duration_seconds
rest_client_request_duration_seconds_bucket
rest_client_request_duration_seconds_sum
rest_client_request_duration_seconds_count
process_resident_memory_bytes
process_cpu_seconds_total
go_goroutines
kubernetes_build_info

api-server (job=kube-apiserver)

apiserver_request_duration_seconds
apiserver_request_duration_seconds_bucket
apiserver_request_duration_seconds_sum
apiserver_request_duration_seconds_count
apiserver_request_total
workqueue_adds_total
workqueue_depth
workqueue_queue_duration_seconds
workqueue_queue_duration_seconds_bucket
workqueue_queue_duration_seconds_sum
workqueue_queue_duration_seconds_count
process_resident_memory_bytes
process_cpu_seconds_total
go_goroutines
kubernetes_build_info

core-dns (job=kube-dns)

coredns_build_info
coredns_panics_total
coredns_dns_responses_total
coredns_forward_responses_total
coredns_dns_request_duration_seconds
coredns_dns_request_duration_seconds_bucket
coredns_dns_request_duration_seconds_sum
coredns_dns_request_duration_seconds_count
coredns_forward_request_duration_seconds
coredns_forward_request_duration_seconds_bucket
coredns_forward_request_duration_seconds_sum
coredns_forward_request_duration_seconds_count
coredns_dns_requests_total|coredns_forward_requests_total
coredns_cache_hits_total
coredns_cache_misses_total
coredns_cache_entries
coredns_plugin_enabled
coredns_dns_request_size_bytes
coredns_dns_request_size_bytes_bucket
coredns_dns_request_size_bytes_sum
coredns_dns_request_size_bytes_count
coredns_dns_response_size_bytes
coredns_dns_response_size_bytes_bucket
coredns_dns_response_size_bytes_sum
coredns_dns_response_size_bytes_count
coredns_dns_response_size_bytes_bucket
coredns_dns_response_size_bytes_sum
coredns_dns_response_size_bytes_count
process_resident_memory_bytes
process_cpu_seconds_total
go_goroutines
kubernetes_build_info

Scrape configuration & metrics add-on

The metrics add-on is compliant with the Prometheus configuration schema [documentation], meaning, you can define custom scrape configuration using Prometheus configuration schema. In other words, providing custom scrape configuration to the metrics add-on is same as writing a scrape job for Prometheus (through prometheus.yaml).

Metrics add-on doesn’t understand/support Prometheus operator CRDs (like Pod monitor, Service monitor etc..), as it takes Prometheus config (yaml) providedto it as config map(s)

Additional custom scrape configuration can be provided to the metrics add-on through 2 optional config maps (in kube-system namespace).

  1. ama-metrics-prometheus-config — This config map is for the singleton replica-set. Any ‘cluster’ wide scraping targets like services, ingress etc. should be provided here. Metrics add-on replica will run thru Prometheus discovery on the provided configuration and scrape discovered objects. [sample configmap]
  2. ama-metrics-prometheus-config-node — This config map is for the daemon set that runs in every node. Scrape config given thru this config map will be used by daemon set to run through Prometheus discovery and scraping will be done by the add-on daemon set running in every node. Any node ‘local’ targets can be scraped from here (ex;- custom/app daemon set metrics). This can also do discovery, but it is recommended not to do discovery here, but use static targets. To be more helpful to write scrape jobs, node’s ip address is available to config map ( as $NODE_IP env variable) that can be used here in the scrape config to scrape endpoints in each node. [sample configmap]

Default scrape frequency (if not specified in the custom scrape jobs) will be 60s (same as Prometheus defaults). Also note that the default scrapes by the add-on are at 30s frequency.

ama in the above config map name(s) stand for Azure Monitor agent, which is the core agent used by the add-on. Metrics Add-on pod names running in the kube-system namespace also have ‘ama’ prefix, to emphasize the same.

You can read more in Microsoft and/or Prometheus documentation about authoring scrape config and also using our tool to validate them before providing them as config map to the add-on.

See some scrape config tips/samples here.

Metrics add-on configuration

In addition to giving additional Prometheus scrape configuration through 2 config maps, you can also update the metric add-on configuration settings (not scrape settings) through a 3rd optional configmap (in kube-system namespace).

ama-metrics-settings-configmap — This has the following settings that are configurable. Default values specified below apply to each setting, when either the configmap is not created by customer or when settings are missing in the configmap provided. [sample configmap]

default-scrape-settings-enabled section has following settings. Use each of these settings to change/override default values that decides whether or not to scrape each of these built-in targets for your AKS cluster that’s being monitored by the add-on

  • kubelet (default value when unspecified is true) — when true, this enables discovering & scraping kubelet target in each node in the cluster
  • coredns (default value when unspecified is false) — when true, this enables discovering & scraping coredns target in the cluster
  • cadvisor (default value when unspecified is true) — when true, this enables discovering & scraping cadvisor target in each node in the cluster
  • kubeproxy (default value when unspecified is false) — when true, this enables discovering & scraping kubeproxy target in each node the cluster
  • apiserver (default value when unspecified is false) — when true, this enables discovering & scraping apiserver target in the cluster
  • kubestate (default value when unspecified is true) — when true, this enables discovering & scraping kube-state-metrics target in the cluster
  • nodeexporter (default value when unspecified is true) — when true, this enables discovering & scraping node-exporter target in each node in the cluster

default-targets-metrics-keep-list section has the following settings (one for each target)— Each of these target settings can be used to add more metrics (as regex) in addition to the ones that are allow-listed by default in the `minimalingestionprofile` for that target (if the target is enabled)

ex;- You can specify below for kubelet target
kubelet = “X|Y”

The above is basically a regex which means ‘in addition to the default metrics allow-listed for this target, add metrics X & Y to the allow-list/keep-list’.

There is one keep/allow list setting per default ON target (kubelet, cadvisor, kubestate, nodeexporter) and also one per default OFF target (coredns, kubeproxy, apiserver).

When you allow list , say a histogram metric, you would have to specify _sum _count & _bucket series for that metric explicitly in the keeplist. ex:- if you are allow listing say sample_request_size_bytes histogram metric, you will add below to the allow/keep list -

sample_request_size_bytes_bucket|sample_request_size_sum|sample_request_size_count

Ways to customize ingestion to control volume/cost

As we saw in the above sections, metrics add-on has several ways to control ingestion volume. Below is a summary of possible ways you can minimize/customize ingestion through the metrics add-on via its config map -

  1. Case 1 : Ingest only minimal metrics per default target. This is the default behavior (minimalIngestionProfile="true"). In this case, only the above listed series/metrics will be ingested for each of the default targets.
  2. Case 2 : Ingest few additional metrics (say X,Y) for one or more default targets in addition to default allow-listed metrics for that target (minimalIngestionProfile="true"). In this case you have to specify the appropriate keeplistRegexes.* specific to the target as specified in the above section. ex;- keeplistRegexes.kubelet="X|Y". We will merge X,Y with default metric list for the kubelettarget and will ingest them along with X & Y.
  3. Case 3 : Ingest only metrics X,Y for a target, and nothing else. In this case, please set minimalIngestionProfile="false", and specify the appropriate keeplistRegexes.<targetname>="X|Y" specific to the target. You can see all the default target names in the above add-on configuration section.
  4. Case 4 : Ingest all metrics exposed for the default target(s). In this case, please set minimalIngestionProfile="false", and don't specify any keeplistRegexes.<targetname> for that target. This can increase metric ingestion volume, by a factor, for that target.

Recording rules & Alert rules

Recording rules & alerting rules can be configured using ‘rulegroups’, which targets a AMW. You can see the list of recording & alerting rules on an AMW by going to ‘Rule groups’ menu for AMW resource in the Azure Portal.

AMW — Rule groups

You can create rule groups (containing recording and/or alert rules with PromQL queries) thru Azure CLI — See documentation

Few tips on recording & alerting rules

  1. Rule Group has a ‘cluster’ parameter, which you can use to restrict/filter queries used in the rules for that group to a specific AKS cluster (each sample ingested by metrics add-on has cluster label, whose value is the name of the AKS cluster). If you didn’t specify a cluster filter for the rule group, then that rule group will be evaluated globally (across data from multiple clusters if you have more than one AKS cluster sending data to the same AMW), which might cause slower query performance due to the volume of the time-series queried. All default recording & alert rule groups enabled by metrics add-on to use the cluster parameter, as they are deployed once per cluster.
  2. You can find more details about rule groups in Azure documentation here.

Cardinality & AMW

By default, each AMW will have sample (events) limit of 1 million per minute and also time-series limit of 1 million per 12 hours. You can find the current usage number & used % and also current limits from the Azure portal by going to ‘Metrics’ menu for the specific AMW (below sample screen-shot). To increase these default limits , you can create a new support request from the ‘New Support Request’ menu (below sample screen-shot) for the specific AMW

View AMW quota limits & creating AMW support requests from Azure portal to increase AMW limits

A sample is also known as an event. A unique combination of metric name and label name + label value combination is known as a time-series.

node_cpu_seconds{cpu="0", mode="guest"} 0
node_cpu_seconds{cpu="0", mode="idle"} 2.03442237e+06

The above from a single scrape constitutes 2 events and 2 time-series.
If the same are scraped & ingested again in the next scrape, events ingested would have increased to 4, but time-series ingested would still be 2.

Service Quotas & Limits

Azure monitor managed service for Prometheus has quotas & limits (ingestion, query etc.) that are published in Microsoft documentation.

Limitations

There are several limitations currently that are being worked on while the solution is in preview. Below are currently known limitations -

Metrics add-on known limitations

  1. Windows nodes and metrics add-on — metrics add-on currently runs only on Linux nodes in AKS, though you can scrape windows targets (like pods, nodes etc.) from Linux nodes. This might not scale well for larger clusters with lots of Windows nodes/pods.
  2. ARC enabled Kubernetes in Azure — metrics add-on currently integrates only with AKS clusters, and not ARC enabled Kubernetes clusters.
  3. Metrics add-on can be enabled only on AKS clusters which have managed identity enabled [both system-assigned, and User-assigned managed identities are supported] (you would have to update your AKS cluster(s) to use system assigned managed identity, if your AKS clusters are using service principal).
  4. Metrics add-on will not be able to monitor & collect data from Private-linked AKS clusters.
  5. There is no ‘push-able’ endpoint (push-gateway equivalent) for ephemeral jobs to push metrics into metrics add-on. Recommendation is to run push gateway by yourself in the AKS cluster and configure metrics add-on to scrape push gateway (by providing a custom job/scrape config thru config map) to collect and ingest those metrics into AMW.
  6. Time series with +-Inf and NaN values will be dropped by metrics add-on.
  7. Scrape frequencies less than 1s are not supported.

Query known limitations

  1. Queries that do not have metric names are not allowed ex:- count({__name__=~".+."}) by (__name__)
    labelvalues(my_label)
    ← Use label_values(my_metric,my_label) instead
  2. Case-sensitivity:
    - Any specific casing specified in the query for label names & values (non-regex) will be honored by the query service (meaning query results returned by query service will have the same casing passed in thru the query for those specified labels & their values)
    - For label names & values not specified in the query (including regex-based value matchers), query service will return results in all lower case

Providing feedback

Any level of feedback would help us make the service better (including documentation/code-samples etc..). There are many ways to reach out to us. Below is a good order of preference. #1, #2 & #3 would get faster responses, as our teams actively watching GitHub repo(s) & addressing support tickets :)

  1. Follow/watch us in GitHub thru our metrics add-on github repository — Please use discussions, unless its an issue for which you want to create an issue. Contributions to metrics-addon thru the above GitHub repository is also very much welcomed & appreciated :).
  2. For documentation suggestions, please file an issue by following the link in the bottom of the documentation page (see below screen).

3. Create an Azure support ticket.

4. E-mail askazprom@microsoft.com (1:1 email responses would be best effort and will be delayed).

5. Follow us (Azure Monitor) in Twitter

References & useful links

  1. Azure Monitor managed service for Prometheus documentation
  2. Azure Monitor Workspace
  3. Enabling metrics add-on for an AKS cluster
    - Azure CLI
    - ARM templates
    - Azure Portal Ux
  4. Ingesting Prometheus data into AMW thru self-managed Prometheus Remote-write
  5. AMW Rule-groups (recording& alerting rules)
  6. Prometheus scrape configuration schema
  7. Recording rules & alerting rules using rule groups in Azure Prometheus
  8. Sample Prometheus scrape configs & tips
    * Prometheus github
    * Metrics add-on docs
  9. Scrape config validation tool
  10. Azure Monitor managed service for Prometheus service quotas & limits
  11. Azure Kubernetes service documentation
  12. Metrics add-on GitHub repository
  13. Sample configmaps for metrics add-on

--

--

Vishwanath
Microsoft Azure

Software Engineer@Microsoft #AzureMonitor #AzureMonitorForContainers #AKS #AzurePrometheus #AzureMonitorManagedPrometheus. Twitter @_vishiy_