Kubernetes Pod Monitors & Re-Labeling — Managing Cardinality

M Castelino
Kubehells
Published in
2 min readJan 31, 2021

The Prometheus operator offers a simple method to scrape metrics from any Pod. However in many cases the Pod itself is not what you are monitoring; but the Pod is used to expose metrics that relate to the Node. In such cases what the user cares about is the Node on which the Pod runs, and not the Pod itself. This is specially true in the case of daemonsets where Pods do not have a stable identify.

By default when using PodMonitor all the time series data will have the instance label set to the Pod's name. Also there is no instance label associated with the timeseries. This means that the user will need to perform the mapping between the Pod and the Node on which it ran, which is not easy.

If a metrics collectors are updated, deleted, redeployed the number of time series associated with a given metric increases significantly. This increases cardinality putting excessive pressure on the timeseries database. This can be avoided easily.

Prometheus Relabeling

Prometheus allows the instance name (among other labels) to be relabeled in a very simple manner as shown below using relabelings.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: mycollector
labels:
k8s-app: mycollector
type: mytype
spec:
selector:
matchLabels:
k8s-app: mycollector
namespaceSelector:
matchNames:
- mynamespace
podMetricsEndpoints:
- port: exporter
interval: 10s
relabelings:
- action: replace
sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: instance

Here we specify that original label instance be replaced by __meta_kubernetes_pod_node_name. __meta_kubernetes_pod_node_name is the name of the node the pod is scheduled onto. Many such built in meta data objects are available in Prometheus, which can help you modify appropriate labels to make them more meaningful or invariant. The full list of such meta data is available in Prometheus documentation.

Dropping Labels

Furthermore you should always try and drop any and all metrics that you will not care about to further reduce cardinality. Kube-prometheus has some very good examples of dropping both metrics and labels.

metricRelabelings:
- action: drop
regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
sourceLabels:
- __name__

--

--