Understanding Kubernetes Exit Code 137: A Silent-killer of Java apps

Published in

mossfinance

10 min readOct 24, 2023

· Introduction
· What does Exit Code 137 mean?
· Kubernetes Resource Management
· Java Memory Model
· Investigating Memory Usage in Kubernetes
· Java Native Memory Tracker
· The root cause
· Conclusions
· Further Reading

Introduction

At Moss, Java apps are at the core of our microservices. However, from time to time, they encounter issues. This is where the Moss Platform Team steps in. We are the team responsible for ensuring the smooth operation of applications on Kubernetes.

A few weeks ago, an issue occurred when some of our microservice pods began restarting unexpectedly, without any apparent reason and at varying hours. It became clear that something was wrong.

Our Service Level Objectives (SLOs) remained intact, thanks to the multiple redundancy layers in place for our workloads. However, our monitoring systems showed that certain pods were undergoing restarts.

$ kubectl get pods -l app=service
NAME                      READY     STATUS   RESTARTS AGE
service-xyz789-pqr567      1/1      Running     3     (2d7h ago) 7d10h

When we examined the pod, we discovered the following:

$ kubectl describe pod service-xyz789-pqr567
(...)
State:        Running
Started:      Tue, 16 Jul 2023 11:14:13 +0200
Last State:   Terminated
Reason:       OOMKilled
Exit Code:    137

This article is all about our journey to deal with memory problems in Java apps on Kubernetes.

⚠️ Exit Code 137 may also occur if the pod exceeds its allocated storage limits.

TL;DR
If you are experiencing Java applications running in Kubernetes being restarted due OOMKilled/Exit Code 137, it is probably caused by a memory leak.
Java Native Memory Tracker can help you to identify the memory area which is leaking.

What does Exit Code 137 mean?

Exit Code 137 is a message that indicates a program was abruptly halted by the computer. It’s as if the computer is saying, ‘Stop right now!’

To understand this code, let’s break it down: Exit code 137 results from the combination of two factors:

128, which represents an exit command with an invalid code.
9, which corresponds to the SIGKILL signal — a signal that forcefully terminates a process.

In simpler terms, when you encounter Exit Code 137 (128+9), it means Kubernetes sent a SIGKILL signal to the process, forcibly removing the workload.”

Kubernetes Resource Management

Before starting with the root cause investigation, we need to understand how resource management works in Kubernetes.

In Kubernetes, resources such as CPU, disk, and memory are managed through Linux Cgroups. This is how the system divides and organizes its resources for different processes, so none of them can take too much and everything runs smoothly.

Cgroups implement distinct measures based on the resource type to enforce its limits. For instance, if a Pod reaches its CPU usage limit, the process(es) running in the Pod gets throttled down (it runs slower). The effect is some requests may slow down or timeout which could cause LivenessProbe to fail (this is the typical scenario for web services), and after failureThreshold of the LivenessProbe is reached the container engine restarts the Pod.

Example:

(...)
  Warning  Unhealthy  4m55s  kubelet  Liveness probe failed: Get "http://172.16.100.14:8000/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal  Killing 4m55s  kubelet  Container service-xyz789-pqr567 failed liveness probe, will be restarted.

Storage and Memory, however, trigger distinct behaviour in cgroups. Rather than slowing down the process, cgroups terminate it directly, often leading to Exit Code 137 messages.

If a Pod surpasses its storage limits, we will encounter an error such as:

1 Evicted: Pod ephemeral local storage usage exceeds the total limit of containers 10M
Task [task-id] deleted with exit code 137

However, if the pod exceeds its memory limits, we may encounter error messages like:

Task [task-id] ran out of memory

Upon inspecting the system logs, you might encounter entries like:

“Memory cgroup out of memory: Killed process 3710859 (java) total-vm:59637676kB, anon-rss:519816kB, file-rss:32776kB, shmem-rss:2976972kB, UID:2000 pgtables:7568kB oom_score_adj:977”

At this stage, the Moss Platform Team confirmed that the pods were restarting due to memory usage exceeding their limits. As a result, we needed to investigate further the issues with our Java microservices.

Java Memory Model

I won’t go into extensive detail about the Java Memory Model, but when running Java apps in Kubernetes, it’s important to take into account these memory areas: Java Heap Memory, Java Non-Heap Memory, and Native Memory.

The Platform Engineers at Moss started working to understand how our Java microservices were using memory. This could be accomplished through various methods, e.g.: utilizing a JMX metrics exporter, a Java Agent, or directly connecting to the running pod and executing the jstat command.

This is what we gathered from our monitor platform.

We were able to confirm that the Pod’s memory usage increased gradually, going from 1.25GB to 2GB. Our task at this point was to identify where and how the memory was being utilized.

Despite the committed Heap memory remaining consistently stable (we use Xmx = Xms, so we commit the total available upper limit of the heap to the system memory), and there were no apparent issues with Metaspace or CodeCache (Non-Heap memory), it remained unclear where this memory was being consumed.

We understood Java Heap Memory & Non-Heap is limited via configuration flags, for example -Xmx for Heap or –XX:MaxPermSize for Non-Heap but Native Memory is not limited by Java.

Investigating Memory Usage in Kubernetes

If you’re new to Kubernetes, you may find that traditional commands like free can produce data that is hard to interpret. For instance, if you run the free command in a Pod with a 2000MB memory limit, the output might look like this:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          64315       17171       20632       16586       26512       29896
Swap:             0           0           0

Here, the total available memory is not 2000MB but rather 64315MB, which is the total memory capacity of the node where the Pod is running. This discrepancy is a result of cgroups, and it can be somewhat confusing. However the actual memory usage of a Pod can be verified by checking the following pseudo-files:

/sys/fs/cgroup/memory/memory.usage_in_byte for Cgroups v1.

/sys/fs/cgroup/memory.current for Cgroups v2.

Up until that point, our working theory was that Java might not have been respecting the memory limits set for the Pods. It appeared as if Java believed it had more available memory than it actually did. This was due a old myth running on the internet about Java doesn’t work well in containers, however, we’re able to discard this theory.

Java Container Awareness
One of the first things you might encounter when searching the internet for problems related to running JVM in Kubernetes is the outdated myth that Java doesn’t work well in containerized environments. However, this is no longer the case, as it has been resolved since Java 8u191. You can confirm this by using the command java -XshowSettings:system -version.

$ java -XshowSettings:system -version
Operating System Metrics:
Provider: cgroupv2
Effective CPU Count: 2 → It matches your resources.cpu.limits in the Pod definition.
(...)
Memory Limit: 1.86G → It matches your resources.memory.limits in the Pod definition.

At this point, we began to suspect that there might be additional processes running alongside java. To investigate, we checked the number of processes running in each Pod using the command:

$ ps | cat
PID   USER     TIME  COMMAND
1 app 49:57 java -server -XX:+UseZGC -Dnetworkaddress.cache.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -XX:MaxDirectMemorySize=400M -Xmx750M -Xms750M -javaagent:java-agent.jar -Dspring.profiles.active=enabled_profile org.springframework.boot.loader.JarLauncher

We also asked our backend engineers if any of the libraries we were using could be secretly running something in the operating system that might be causing memory leaks. However, this appeared highly improbable, as we had encountered this issue in several distinct microservices with different functionalities.

If from our Java code we run a third application as for example withjava.lang.Runtime , the memory usage of this external app is not represented in the Heap or Non-Heap memory utilization.

In the end, we found this amazing article: ‘Troubleshooting Native Memory Leaks in Java Applications’ by Poonam Parhar. In the article, Poonam explains how certain Java code can make calls to native C, which directly allocates memory in the system. This memory allocation isn’t visible in the Java Heap or Non-Heap memory areas. The article also introduced us to a tool we hadn’t known about before, called the Java Native Memory Tracker.

MALLOC_ARENA_MAX
Many online articles discuss the possibility of memory leaks due to poor GLIBC configuration, particularly regarding MALLOC_ARENA_MAX. Before diving into this, make sure you know the base operating system (OS) image of your container. If you’re using Alpine, you don’t need to worry about this issue because Alpine uses MUSL instead of GLIBC.

Java Native Memory Tracker

After discarding problems with a third app and the potential misconfiguration of GLIBC, we started to investigate the native memory usage with JNMT. This tool is included with the standard JDK package, but it’s usually turned off by default because it can impact memory usage. Before enabling it we increased the memory limits of our pods by 20% because Java NMT adds an extra overhead.

To enable it, we needed to pass the following options to JVM:

-XX:NativeMemoryTracking=summary or -XX:NativeMemoryTracking=detail .

We made several attempt and while “summary” helped to confirm the leak, “detail” provided a better view in where it was located.

So we changed our container image entry-point to something like:

1 app 49:57 java -server -XX:+UseZGC (...) -XX:NativeMemoryTracking=detail -Xmx750M -Xms750M -javaagent:java-agent.jar -Dspring.profiles.active=enabled_profile org.springframework.boot.loader.JarLauncher

Next, we needed to start the collector. This was done by connecting to the Pod and triggering this command jcmd <Java PID> VM.native_memory baseline scale=MB :

❯ kubectl exec -ti service-xyz789-pqr567 -- sh
(...)
$ ps
PID   USER     TIME  COMMAND
    1 app   0:40 java -server (...)
   86 app  0:00 ps
$ jcmd 1 VM.native_memory baseline scale=MB
1:
Baseline succeeded

From this moment the collector has started and we took our first snapshot.

jcmd <Java PID> VM.native_memory summary scale=MB

We keep running multiple snapshots over the time, the interval depends on the speed in the memory leaks in the microservices. For our case, once every hour was enough and this was the outcome:

(note, we stored the output in a file out of the Pod)

$ kubectl exec service-xyz789-pqr567 -- jcmd 1 VM.native_memory summary scale=MB | tee native_memory_detail_$(date +%s)
Native Memory Tracking:
(Omitting categories weighing less than 1MB)
Total: reserved=45548MB +222MB, committed=1097MB +224MB
-              Java Heap (reserved=36864MB, committed=768MB)
(...)
-                  Class (reserved=3MB +1MB, committed=3MB +1MB)
(...)
-                     GC (reserved=8429MB +1MB, committed=74MB +1MB)
(...)
-               Internal (reserved=250MB +220MB, committed=250MB +220MB)
(...)
-                  Other (reserved=2MB, committed=2MB)
(...)

From this output, we go to know that Total: (…) commited=1097MB matches the total memory usage of the Pod (indicated with for example kubectl top pod_name command) which confirmed JNMT captured the leak.

The second part that we constantly growing after taking several snapshots was Internal (reserved=250MB +220MB, committed=250MB +220MB) was constantly growing as in the initials snapshots this was something like +50MB , then 110MB , etc…

When we looked at the detail view by grep-ing between the bunch of different memory allocations to those matching type=Internal, we found:

[0x000078cf75e406dd] JvmtiEnv::GetMethodName(Method*, char**, char**, char**)+0xdd
[0x000078cf75deb7e3] jvmti_GetMethodName+0x103
[0x000078cd552800ca]
[0x000078cd55261353]
                          (malloc=250MB type=Internal +220MB (...)

Bingo! We found the Method responsible for our memory leak!

We ran to our Backend Engineers with the insights and unfortunately, they did not recognize that method. However, they pointed out that “JVMTI” (Java Virtual Machine Tool Interface) could be related to a Java agent.

The root cause

After our latest discoveries, and with the realization that this issue was affecting multiple of our microservices, it became clear that the problem was caused by our Java monitoring agent. So we ran to the upstream library project for any potential update and we found:

After a few tests enabling/disabling the profiler and checking the workloads with the affected version, we confirmed the source of the leak.

Conclusions

During the investigation, we observed something interesting — the problem appeared to be resolved after a few days. This can be attributed to the fact that we regularly perform periodic builds of our base images to ensure they are up-to-date. For a period, the problematic version of the agent was included in our base images. However, when our routine build job recreated the base image, the issue disappeared.

Our previous Dockerfile looked something like this:

FROM curlimages/curl AS download-agent
WORKDIR /tmp
RUN curl -sL https://hosting/latest-java-agent -o java-agent.jar
(...)
FROM internal-repo/base-jre as builder
COPY --from=download-agent /tmp/java-agent.jar .

And we updated to:

FROM curlimages/curl AS download-agent
WORKDIR /tmp
RUN curl -sL -o java-agent.jar \
 https://hosting/java-agent-1.1.jar  java-agent.jar
RUN curl -sL  -o checksum \
 https://hosting/java-agent-1.1.jar.sha256
RUN echo "$(cat checksum) java-agent.jar" | sha256sum -c
(...)
FROM internal-repo/base-jre as builder
COPY --chown=app:app --from=download-agent /java-agent.jar .

Additionally, we’ve configured monitors to watch the incremental memory usage, allowing us to react before an OutOfMemory event occurs. This includes options like restarting services or increasing memory allocation, giving us a buffer period for troubleshooting the memory leaks.