Container resources’ request, limit & usage metrics

Containers, photo by frank mckenna

Kubernetes(k8s) is orchestrating the world now, not just containers. The growth in adoption of Kubernetes and the contributions towards this open-source system is massive and pull-requests’ landing in GitHub are from all over the world. In such automated and fast-paced world, understanding the metrics and monitoring is essential to solve the problems. Often this comes handy with Prometheus and lots of official and un-official exporters which pulls the metrics data and serve to Prometheus.

The scaling-on-demand for services and cluster nodes is another massive win for k8s by eliminating the late-night work for installing a new machine and provisioning it during the heavy load time. ( May be “Thanks-Giving” and “Christmas” ?)

But often these advantages provided by k8s are not well utilized. In some scenarios the CPU and Memory resource are over utilized by some running services and makes it unavailable for other services getting provisioned in the cluster, which in-turn triggers the scaling of the cluster. This incurs cost to the management without the actual need for it.

To avoid/monitor situations like this, there are few boundaries like quota restriction, but this will restrict deploying any additional services altogether once the quota is reached. So, it would be great to have the resources like CPU and Memory monitored and triggered a proactive alert when a certain level of threshold is breached (say 80%). There are various tools available for this process. A famous tool is cAdvisor from Google. But most of the time the detailed information is not needed and the needed information will be missed.

The Container Resource Exporter(CRE) is to solve this simple problem with a simplistic solution in an elegant way. The CRE is specifically used to capture the containers’ resource Request quantity, Limit Quantity and Current usage in real-time. Of course the current usage is the real-time metric. The request and limit are specified as pre-defined resource object as part of the deployment YAML.

CRE makes use of the Metrics API from k8s server and queries the current usage metrics for CPU and Memory for every container. This can run in two different scopes viz., local and cluster scope. The local scope is done by setting the namespace to be watched in environment variable via downward API in the deployment. The later one will scrape all the containers across the clusters. Along with the resource its status as Running, Unknown, Pending or Error is also exported. Not just the resource statistics, the total number of pods are also scraped and exported. If running in cluster scope, total pod count per namespace is exported.

These scraped metrics are printed in the Prometheus metric text format with the labels that best describes the container for easy integration with Prometheus Alert Manager and Grafana for visualization.

Sample exported metrics

# HELP cpu_limit CPU Limit by deployment
# TYPE cpu_limit gauge
cpu_limit{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 1
cpu_limit{container_name="alertmanager-configmap-reload",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 1

# HELP cpu_request Requested CPU by deployment
# TYPE cpu_request gauge
cpu_request{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 0.001
cpu_request{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-nlqqc",status="Running "} 0.001

# HELP current_cpu_usage Current CPU Usage as reported by Metrics API
# TYPE current_cpu_usage gauge
current_cpu_usage{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9"} 0
current_cpu_usage{container_name="alertmanager-configmap-reload",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9"} 0

# HELP current_memory_usage Current CPU Usage as reported by Metrics API
# TYPE current_memory_usage gauge
current_memory_usage{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9"} 1.4168064e+07
current_memory_usage{container_name="alertmanager-configmap-reload",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9"} 1.363968e+06

# HELP memory_limit Memory Limit by deployment
# TYPE memory_limit gauge
memory_limit{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 5.36870912e+08
memory_limit{container_name="alertmanager-configmap-reload",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 1.073741824e+09

# HELP memory_request Requested Memory by deployment
# TYPE memory_request gauge
memory_request{container_name="alertmanager",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 2.68435456e+08
memory_request{container_name="alertmanager-configmap-reload",namespace="default",pod_name="prometheus-alertmanager-74bd9d5867-gmlj9",status="Running "} 5.36870912e+08

# HELP total_pod Total pod count in given space
# TYPE total_pod counter
total_pod{namespace="default"} 1

Visualization

The below graph is the visualization of the changes in number of pods over the time, followed by the total CPU and Memory Request/Limit/Usage graph overlay on the local scope (namespace level) and a CPU & Memory req/limit/usage overlay of a sample pod.

Grafana dashboard displaying total pods, total CPU and Memory graph and a sample pod’s cpu and memory layovers

Sample AlertManager Rule

alert: CPU Over usage alert
expr: sum(current_cpu_usage) > 5
for: 5m
labels:
severity: critical
annotations:
description: 'Current value: {{ $value }}'
summary: CPU usage is high consistently. This can cause extra load on the system.

This alert rule setup in the AlertManager will trigger an alert to the default receiver when ever the CPU utilization is more than 5 for consistent 5 minutes of period.

Gotchas’

To run CRE, it needs a Service Account which has a minimal “list” action on the “pods” resources over the “Core API” and “Metrics API”. This can be provisioned with Role and RoleBindings for local scope and ClusterRole and ClusterRoleBindings for cluster scope.

All those are taken care in the Helm chart for easy installation on the desired scope here. PR’s welcomed.