Memory settings for Java process running in Kubernetes pod

Fan Liu
7 min readAug 25, 2023

--

Managing the memory usage of a Java process running in a Kubernetes pod is more challenging than one might expect. Even with proper JVM memory configurations, OOMKilled issues can still arise and you wonder why.

TL;DR

There is no way to guarantee the complete memory bundary of a Java process since the JVM respects only the heap size limit; not non-heap memory, which will depend on various factors. Start with a 75% ratio of heap to non-heap memory, and keep a close watch on how your memory behaves. If things get out of hand, you can tweak your pod’s memory limits or fiddle with the heap-to-non-heapratio to dodge the OOMKilled mishaps.

Context

We faced repeated OOMKilled and restart issues with our production Java application running in Kubernetes. Despite defining the memory settings at both the pod and JVM levels, the pod’s total memory usage fluctuated leading to frequent restarts.

  • Pod level configuration: we initially set the pod’s memory limit to 2Gi, using the following settings:
resources:
requests:
memory: "2Gi"
cpu: "4"
limits:
memory: "2Gi"
cpu: "4"
  • JVM level configuration: we specified a percentage of system memory the JVM should use allowing the JVM to adapt to its environment.
-XX:MaxRAMPercentage=80.0

It’s important to note that the MaxRAMPercentage does NOT constrain the size of the total memory the Java process can use. It specifically refers to the JVM heap size, as the heap is the only memory accessible and used by the application. With these settings, the pod had 2Gi of system memory, out of which 1.6Gi was allocated to the heap and 0.4Gi is available to non-heap memory. (Keep in mind that 2Gi equals 2 * 1024 * 1024 * 1024 = 2.15GB, as the monitoring metrics use GB as the memory unit on the dashboard.)

Initial Attempt to Address the Issue

To mitigate the OOMKilled issue, we increased the pod’s memory limit from 2Gi to 4Gi, which did help reduce the problem. However, certain questions remained:

  1. Why was the container_memory_working_set and container_memory_rss close to 100%, while the JVM heap and non-heap usage were significantly lower?

2. Why was the Working Set Size (WSS)/Resident Set Size (RSS) memory usage more than the JVM’s total memory given the Java process was the only process running in the pod?

3. Why was the process memory usage still close to 100%, almost reaching the pod memory limit?

Analysis

Question 1

Why was the Java total memory usage much lower than the system memory usage?

We noticed that the container_memory_working_set and container_memory_rss stopped increasing once the committed heap memory reached the maximum heap size.

Committed JVM Heap stopped increasing once reaching the heap limit
❷ ❸ System memory of WSS/RSS stopped increasing when committed memory reached the heap limit

According to the Java doc of MemoryUsage class that’s the metrics coming from:

public long getCommitted​()
Returns the amount of memory in bytes that is committed for the Java virtual machine to use. This amount of memory is guaranteed for the Java virtual machine to use.

The committed memory represents the memory pre-allocated by the JVM from the operating system. Consequently, from the container/pod perspective, the WSS/RSS usage appeared high, while within the JVM, both heap and non-heap memory usage remained low.

This also explained why the OutOfMemory exception did not occur before the pod was OOMKilled because neither heap nor non-heap memory reached the JVM’s limit. Instead, the JVM pre-allocated and reserved memory from the OS without easily freeing it up. The OpenJDK specification explains:

G1 only returns memory from the Java heap at either a full GC or during a concurrent cycle.Since G1 tries hard to completely avoid full GCs, and only triggers a concurrent cycle based on Java heap occupancy and allocation activity, it will not return Java heap memory in many cases unless forced to do so externally. This behavior is particularly disadvantageous in container environments where resources are paid by use. Even during phases where the VM only uses a fraction of its assigned memory resources due to inactivity, G1 will retain all of the Java heap.

So although the actual memory usage of the Java process can be low, the committed memory preallocated by the JVM can be much higher and will not return to the system promptly.

Question 2

Why was the WSS/RSS memory usage more than JVM total memory?

This remains a mystery to me after checking the source of the system memory and the JVM metrics.

The gap between System Memory RSS and the JVM total committed memory

System memory WSS was 3.8GB
JVM heap committed was 3.22GB
JVM total committed memory was 3.42GB

The Native Memory Tracking (NMT) report of the JVM running in the pod gave us a detailed breakdown of the memory usage in the Java process, especially thenon-heap memory. The result was consistent with the JVM Heap and JVM Total metrics.

  Native Memory Tracking:

Total: reserved=5066125KB, committed=3585293KB
- Java Heap (reserved=3145728KB, committed=3145728KB)
(mmap: reserved=3145728KB, committed=3145728KB)
- Class (reserved=1150387KB, committed=113419KB)
- Thread (reserved=297402KB, committed=32854KB)
- Code (reserved=253098KB, committed=73782KB)
- GC (reserved=174867KB, committed=174867KB)
- Compiler (reserved=2156KB, committed=2156KB)
- Internal (reserved=11591KB, committed=11591KB)
- Other (reserved=2690KB, committed=2690KB)
- Symbol (reserved=21454KB, committed=21454KB)
- Native Memory Tracking (reserved=6275KB, committed=6275KB)
- Arena Chunk (reserved=195KB, committed=195KB)
- Logging (reserved=4KB, committed=4KB)
- Arguments (reserved=29KB, committed=29KB)
- Module (reserved=249KB, committed=249KB)

The System Memory Usage WSS/RSS had been confirmed by the RES memory (amount of resident memory used by the process) of running the top command in the pod. And the Java process was the only process running in the pod.

USER   PID    %CPU %MEM  VSZ      RSS      TTY  STAT START TIME   COMMAND
xxx-+ 1 7.7 0.4 24751760 3818536 ? Ssl Jul28 340:41 /usr/java/jdk-11.0.17/bin/java -XX:MaxRAMPercentage=75.0 -XshowSettings:vm -classpath ...
xxx-+ 80559 0.0 0.0 50548 3936 ? Rs 07:02 0:00 ps -aux

So both metrics are trustworthy, but there is still around a 300MB gap between them.

Question 3

Why the system memory usage is still close to 100% after increasing the pod memory limit?

First of all, it’s the resources.limits.memory determining the system memory size instead of the resources.requests.memory . The latter is just for the Kubernetes cluster to find the node that matches the requested memory to run the pod on it.

Secondly, as mentioned previously, only the size of the heap can be specified and tightly controlled by the JVM, but not the non/off-heap memory. Thus, even with increased system memory, non/off-heap memory usage might increase proportionally.

To alleviate this, decreasing the percentage of heap memory allows more space for non/off-heap usage. So here was the next option we tried: decreasing MaxRAMPercentage from 80% to 75% and it worked as expected: WSS/RSS dropped.

Before reducing the heap percentage

➊❷ WSS/RSS is still close to the pod memory limit (4.29GB)

After reducing the heap percentage

➊❷ WSS/RSS stabilized at 3.6GB and had a safe margin to the pod memory limit (4.29GB)

Conclusion

The following approach can be used to tackle the uncertainty of Java process memory usage and eliminate the pod OOMKilled issue:

  1. Begin with a reasonable MaxRAMPercentage value, and75% is normally a good starting point.
  2. Monitor the heap usage and the system memory WSS/RSS over time.
  • if your maximum heap usage is high(i.e. Stays in the >90% range), that’s a signal to increase your pod’s memory limit(resources.limits.memory). Your heap needs more space.
  • if the maximum heap usage is OK (i.e. Keeps well below <90%), but the WSS/RSS is high and close to the process limit, consider decreasing the MaxRAMPercentageto allocate more memory to the non/off-heap space.
  • monitor the maximum WSS/RSS to make sure there is always a 5% to 10% safety margin from the pod memory limit. Don’t fly too close to the sun!

Thanks Paul Smith for reviewing the post and all the great feedback.

References

--

--

Responses (6)