Demystifying OOM Killer in Kubernetes: Tracking Down Memory Issues

Unravelling the mysteries of the OOM killer, delve into its inner workings, and learn how to track down memory issues that lead to OOM kills.

Tuhin Banerjee

Published in

Cloud Native Daily

6 min readJun 9, 2023

Introduction

Have you ever encountered the dreaded OOM (Out-of-Memory) killer in Kubernetes? This critical component plays a vital role in managing memory resources efficiently within your cluster. However, when a pod exceeds its memory limit and consumes excessive memory, the OOM killer steps in, terminating the pod to free up memory for other critical processes. In this blog, we will unravel the mysteries of the OOM killer, delve into its inner workings, and learn how to track down memory issues that lead to OOM kills. So, let’s dive in!

Understanding the OOM Killer

The OOM killer is an essential mechanism in Kubernetes that helps maintain the system's stability and prevent memory exhaustion. It acts as the last line of defense when memory resources are critically low. In such cases, the OOM killer identifies the process or pod responsible for the memory overload and terminates it to free up memory for the rest of the system. By sacrificing one process, the OOM killer prevents a complete system crash, ensuring the overall stability of the cluster.

How OOM Kills Are Triggered

When a pod in Kubernetes exceeds its specified memory limit, it triggers an OOM event. The container runtime, such as Docker, reports the memory usage to the Kubernetes kubelet. The kubelet, in turn, monitors the memory usage of all pods and compares it against their respective limits. If a pod breaches its limit, the kubelet initiates the OOM killer to terminate the offending pod, freeing up memory resources for other critical workloads.

Understanding Memory Metrics and the OOM Decision

To make informed decisions about OOM kills, the OOM killer relies on memory metrics obtained from cAdvisor (Container Advisor) and exposed to Kubernetes. The primary metric used by the OOM killer is `container_memory_working_set_bytes`. It represents an estimate of the memory that cannot be evicted, considering the memory pages actively used by the container. This metric acts as the baseline for the OOM killer to decide whether a pod should be terminated or not.

Differentiating Between Memory Metrics

While `container_memory_usage_bytes` may seem like an obvious choice for monitoring memory utilization, it includes cached items, such as the filesystem cache, which can be evicted under memory pressure. Hence, it does not accurately reflect the memory observed and acted upon by the OOM killer. On the other hand, `container_memory_working_set_bytes` provides a more reliable indication of memory usage, aligning with what the OOM killer monitors. It focuses on the memory that cannot be easily reclaimed.

How can I debug the memory consumption?

To track memory growth within your application, you can monitor the specific file that provides memory usage information. By deploying the following code snippet as part of your application, you can invoke it in DEBUG mode to print the memory usage:

const fs = require('fs');

// Function to read memory usage
function readMemoryUsage() {
  try {
    const memoryUsage = fs.readFileSync('/sys/fs/cgroup/memory/memory.usage_in_bytes', 'utf8');
    console.log(`Memory Usage: ${memoryUsage}`);
  } catch (error) {
    console.error('Error reading memory usage:', error);
  }
}

// Call the function to read memory usage
readMemoryUsage();

In this code, we utilize the fs.readFileSync method to synchronously read the contents of the /sys/fs/cgroup/memory/memory.usage_in_bytes file. The file is read with the 'utf8' encoding to interpret the data as a string.

The readMemoryUsage the function reads the file and logs the memory usage to the console. If an error occurs during the reading process, it will be caught and logged as well.

Please note that accessing system files /sys/fs/cgroup/memory/memory.usage_in_bytes typically requires elevated privileges. Make sure to run the Node.js script with the necessary permissions or as a privileged user.

code to get node heap usage

const fs = require('fs');

// Function to track memory growth
function trackMemoryGrowth() {
  const memoryUsage = process.memoryUsage();
  console.log(`Memory Usage (RSS): ${memoryUsage.rss}`);
  console.log(`Memory Usage (Heap Total): ${memoryUsage.heapTotal}`);
  console.log(`Memory Usage (Heap Used): ${memoryUsage.heapUsed}`);
}

  trackMemoryGrowth();
});

How to monitor Memory and CPU in K8s using kubectl top?

The `kubectl top` command is a powerful tool in Kubernetes that enables you to monitor the resource usage of pods and nodes in your cluster. It provides real-time information about memory and CPU utilization, allowing you to identify potential bottlenecks, troubleshoot performance issues, and make informed decisions regarding resource allocation. Let’s explore how to use the `kubectl top` command to monitor pod and node resource usage.

Monitoring Pod Resource Usage:
To monitor the memory and CPU usage of pods, you can use the following command:

kubectl top pod

This command provides an overview of the resource usage for all pods in the current namespace. It displays the pod name, CPU usage, memory usage, and the corresponding percentage of resource utilization.
If you want to focus on a specific pod, you can use the pod name as an argument:

kubectl top pod <pod-name>

This command provides detailed resource usage information for the specified pod.
Monitoring Node Resource Usage
To monitor the memory and CPU usage of nodes in your cluster, you can use the following command:

kubectl top node

This command gives you an overview of resource usage for all nodes in the cluster. It provides information about the node name, CPU usage, memory usage, and the corresponding percentage of resource utilization.
Similar to monitoring pods, you can specify a particular node to retrieve detailed resource usage information:

kubectl top node <node-name>

In conclusion, the `kubectl top` command is a valuable tool that allows you to monitor the memory and CPU usage of pods and nodes in your Kubernetes cluster. By using this command, you can gain insights into resource utilization, detect performance bottlenecks, and optimize resource allocation, ensuring the efficient operation of your applications.

Wait, what if I want the same for my docker environment?

docker stats memory display collects data from the path /sys/fs/cgroup/memory


# similar to top
docker stats --no-stream <container id>

On Linux, the Docker CLI reports memory usage by subtracting cache usage from the total memory usage. The API does not perform such a calculation but rather provides the total memory usage and the amount from the cache so that clients can use the data as needed. The cache usage is defined as the value of total_inactive_file field in the memory.stat file on cgroup v1 hosts.

On Docker 19.03 and older, the cache usage was defined as the value of cache field. On cgroup v2 hosts, the cache usage is defined as the value of inactive_file field.

memory_stats.usage is from /sys/fs/cgroup/memory/memory.usage_in_bytes. memory_stats.stats.inactive_file is from /sys/fs/cgroup/memory/memory.stat.

Give me the node code to print the memory usage for the docker container

To use the docker stats use the docker-stats-api library in Node.js and print the container usage, you can follow these steps:

Install the docker-stats-api library as a dependency in your Node.js project by running the following command:

npm install docker-stats-api

Define a function to print the container usage:

function printContainerUsage(containerId) {
  dockerStats.getStats(containerId)
    .then(stats => {
      console.log(`Container Usage (CPU): ${stats.cpu_percent}`);
      console.log(`Container Usage (Memory): ${stats.mem_usage}`);
    })
    .catch(error => {
      console.error('Error retrieving container stats:', error);
    });
}

Here’s the complete example:

const DockerStats = require('docker-stats-api');

const dockerStats = new DockerStats();

function printContainerUsage(containerId) {
  dockerStats.getStats(containerId)
    .then(stats => {
      console.log(`Container Usage (CPU): ${stats.cpu_percent}`);
      console.log(`Container Usage (Memory): ${stats.mem_usage}`);
    })
    .catch(error => {
      console.error('Error retrieving container stats:', error);
    });
}

const containerId = 'YOUR_CONTAINER_ID';
printContainerUsage(containerId);

By executing this code, you will be able to retrieve and print the CPU and memory usage of a specific container using the `docker-stats-api` library in Node.js. Remember to replace `’YOUR_CONTAINER_ID’` with the actual ID of the container you wish to monitor.

Conclusion

By understanding the role and functioning of the OOM killer in Kubernetes, we gain valuable insights into memory management within our clusters. In this blog, we explored how the OOM killer triggers OOM kills and discussed the importance of memory metrics, particularly `container_memory_working_set_bytes`, in making OOM decisions. Armed with this knowledge, you’re well-equipped to monitor and troubleshoot memory issues that lead to OOM kills effectively.

Kubernetes Monitoring with OpenTelemetry

Organizations increasingly deploy and manage their applications using Kubernetes, which has emerged as the de facto…

gethelios.dev

OpenTelemetry Tracing: Everything you need to know

OTel distributed tracing capabilities compensate for traditional observability methods, that master monolith apps but…