Collecting Kubernetes Metrics using Open Telemetry

Timothy
HostSpace Cloud Solutions
5 min readJul 14, 2023
Image from a similar article

Monitoring metrics in a Kubernetes cluster is crucial for ensuring optimal performance, resource utilization, and overall cluster health.

By monitoring various components and their associated metrics, you can gain valuable insights, troubleshoot issues, optimize resource allocation, and ensure the smooth operation of your Kubernetes environment.

In this article, we will explore the key components Kubernetes monitors, the important metrics associated with them, RBAC authorization considerations, and how to configure Opentelemetry Collector to gather and send metrics to a remote Prometheus or Cortex endpoint.

Kubernetes Components to Monitor

kube-controller-manager

The kube-controller-manager monitors and manages multiple controllers responsible for different tasks within the cluster, including ReplicaSet, Node, Service, and Endpoint controllers.

Metrics to Monitor:

  • Controller-specific metrics: Number of desired and running replicas, controller events, and reconciliation duration.

kube-proxy

The kube-proxy component manages network connectivity and load balancing for services within the cluster.

Metrics to Monitor:

  • Network utilization: Incoming and outgoing network traffic for services.
  • Service-specific metrics: Load balancing effectiveness, connection counts, and latency for individual services.

kube-apiserver

The kube-apiserver acts as the primary control plane component, exposing the Kubernetes API and handling cluster-wide operations.

Metrics to Monitor:

  • API Server latency: Time taken to process API requests, ensuring responsiveness.
  • Request rate and error rate: Number of API requests and any errors or failures.
  • Resource utilization: CPU and memory usage of the API server.

kube-scheduler

The kube-scheduler component assigns pods to nodes based on resource requirements, node capacity, and constraints.

Metrics to Monitor:

  • Scheduling latency: Time taken to schedule pods onto nodes.
  • Pod distribution: Distribution of pods across nodes to ensure even resource utilization.

kubelet

The kubelet runs on each node and manages and monitors containers.

Metrics to Monitor:

  • Node resource usage: CPU, memory, and disk utilization of individual nodes.
  • Container metrics: CPU, memory, and network usage of containers.
  • Pod health: Status and health of pods running on each node.

etcd

Although not a Kubernetes component, etcd is a distributed key-value store used by Kubernetes to store cluster state and configuration data.

Metrics to Monitor:

  • etcd cluster health: Availability and health of the etcd cluster.
  • etcd latency: Time taken to process read and write operations.

Container runtimes

Kubernetes monitors the container runtimes responsible for running and managing containers within pods.

Metrics to Monitor:

  • Container CPU and memory usage: Resource utilization of containers.
  • Container restarts: Number of times containers restart, indicating potential issues.

kube-state-metrics

kube-state-metrics generates metrics based on the state of various Kubernetes objects such as pods, deployments, services, and more.

Metrics to Monitor:

  • Object-specific metrics: Health and status of various Kubernetes objects, including deployment status, pod conditions, and service availability.

metrics-server

metrics Server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.

Metrics to Monitor:

  • Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.

Next, we’d be looking at setting up OpenTelemetry’s Collector to aggregate all these metrics.

Authorization for RBAC and Metrics Access

If your cluster uses RBAC, reading metrics requires authorization through a user, group, or ServiceAccount with a ClusterRole that allows access to the /metrics endpoint. We’d be configuring our Open Telemetry Collector to have the right permissions to scrape Kubernetes data.

Note: Ensure you have the discussed components installed on your Kubernetes cluster as metrics can be missing if they are not present.

Configuring Opentelemetry Collector for Metrics Collection and Export

Opentelemetry Collector can be configured to gather Kubernetes metrics and export them to remote endpoints like Prometheus or Cortex. Use the following configuration for Opentelemetry Collector’s Helm Chart:
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector

mode: daemonsetextraEnvs:
- name: K8S_CLUSTER_NAME
value: CLUSTER_NAME
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
presets:
hostMetrics:
enabled: true
kubeletMetrics:
enabled: true
kubernetesAttributes:
enabled: true
kubernetesEvents:
enabled: true
clusterRole:
create: true
name: "opentelemetry-collector-admin"
rules:
- verbs: ["*"]
resources: ["*"]
apiGroups: ["*"]
- verbs: ["*"]
nonResourceURLs: ["*"]
clusterRoleBinding:
name: "opentelemetry-collector-admin"
serviceAccount:
create: true
name: "opentelemetry-collector-admin"
config:
receivers:
kubeletstats:
collection_interval: 10s
auth_type: "serviceAccount"
endpoint: https://${env:K8S_NODE_NAME}:10250
insecure_skip_verify: true
metric_groups:
- container
- pod
- volume
- node
extra_metadata_labels:
- container.id
k8s_cluster:
collection_interval: 10s
node_conditions_to_report: [Ready, MemoryPressure,DiskPressure,NetworkUnavailable]
allocatable_types_to_report: [cpu, memory, storage, ephemeral-storage]
k8s_events:
auth_type : "serviceAccount"
otlp:
protocols:
grpc:
endpoint: 127.0.0.1:4317
max_recv_msg_size_mib: 4
http:
endpoint: 127.0.0.1:4318
jaeger: null
zipkin: null
prometheus:
config:
scrape_configs:
- job_name: "prometheus"
scrape_interval: 10s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
[__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: "true"
action: keep
- source_labels:
[
__address__,
__meta_kubernetes_pod_annotation_prometheus_io_port,
]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
- job_name: "otel-collector"
scrape_interval: 10s
static_configs:
- targets: ["127.0.0.1:8888"]
# Scrape cAdvisor metrics
- job_name: integrations/kubernetes/cadvisor
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- replacement: kubernetes.default.svc.cluster.local:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
server_name: kubernetes
- job_name: integrations/kubernetes/kubelet
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- replacement: kubernetes.default.svc.cluster.local:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$${1}/proxy/metrics
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
server_name: kubernetes
# Scrape config for API servers
- job_name: "kubernetes-apiservers"
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kubernetes;https
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: Namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: Service
processors:
resourcedetection/system:
detectors: [env, system, gcp, eks]
timeout: 2s
override: false
attributes/metrics:
actions:
- action: insert
key: env.name
value: ENV_NAME
- action: insert
key: cluster
value: CLUSTER_NAME
resource:
attributes:
- action: insert
key: env.name
value: ENV_NAME
- action: insert
key: cluster
value: CLUSTER_NAME
batch:
send_batch_size: 10000
timeout: 200ms
memory_limiter:
check_interval: 3s
limit_mib: 1500
spike_limit_mib: 500
k8sattributes:
auth_type: "serviceAccount"
passthrough: true
filter:
node_from_env_var: K8S_NODE_NAME
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.container.name
- k8s.namespace.name
- k8s.node.name
- k8s.pod.start_time
exporters:
prometheusremotewrite:
endpoint: PROMETHEUS_ENDPOINT
timeout: 20s
headers:
"X-Scope-OrgID": common
extensions:
health_check: {}
memory_ballast:
size_mib: 683
service:
extensions: [memory_ballast, health_check]
telemetry:
metrics:
address: 127.0.0.1:8888
logs:
encoding: json
pipelines:
metrics:
exporters:
- prometheusremotewrite
processors:
- attributes/metrics
- memory_limiter
- k8sattributes
- resource
- batch
receivers:
- otlp
- prometheus
- k8s_cluster
- kubeletstats
traces: null
logs: null

The above OpenTelemetry collector configuration gathers metrics from all the components discussed e and sends them to a remote endpoint.

Ensure to:

  • Configure receivers like kubeletstats, k8s_cluster, k8s_events, otlp, prometheus, and more based on the metrics you want to collect.
  • Configure exporters like prometheusremotewrite to send metrics to the remote Prometheus or Cortex endpoint.
  • Customize processors and other settings as per your requirements.
  • Make sure to replace ENV_NAME, CLUSTER_NAME, PROMETHEUS_ENDPOINT and other variables with appropriate values.

You can proceed to install OpenTelemetry using thevalues.yaml file above. The command for this would be:

helm upgrade opentelemetry-collector open-telemetry/opentelemetry-collector values.yaml

You can run the following command to confirm that the metrics are available or query the remote Prometheus where they are sent.

kubectl get --raw /metrics
Screenshot showing metrics

Summary

Monitoring Kubernetes metrics is essential for maintaining a healthy and optimized cluster. Opentelemetry Collector provides a powerful solution for gathering metrics from various Kubernetes components and exporting them to remote endpoints.

By leveraging Opentelemetry Collector with the correct RBAC authorization, you can collect and export metrics efficiently, gaining insights into resource utilization, performance, and cluster health.

Feel free to reach out or share your usage of OpenTelemetry.

--

--