OOM Killer and Java applications in containers

Charan Malemarpuram
Logistimo Engineering Blog
4 min readJan 10, 2019

At Logistimo, all of our applications are containerized and run as docker containers inside Kubernetes. We had noticed lot of restarts on containers with Java apps and is quite random. Docker inspection revealed that the pod was killed by OOMKiller code:137. This meant that the application is consuming more memory than allocated to the container. It didn’t sound right, since we have limits on the Java application using -Xmx and we left about 20% buffer as Kubernetes resource limit (docker container) for the Meta space and GC data.

For e.g., 2 GB for Java process, and 2.4 GB for the Kubernetes resource.

Subsequent sections cover this problem and how to solve it in detail.

JVM memory usage

Obviously the first step was to review why the container is exceeding the said limit, clearly these are sufficiently buffered.

“ps” command confirms that the Xmx is indeed in place, and is set to max of 4GB.

Java process set to Max 4G

“top” command, however reveals that the physical memory used is 4.5 GB.

Actual memory usage of the Java process

Why would Java take 500 MB more than allocated?

Since JDK 1.8.40 there is a Native memory tracker tool introduced which provides a detailed breakup of memory used by Java application with every byte accounted for. Note that NMT tool shows committed, resident might be less.

Actual usage = Heap memory + Meta space + Off heap

Off heap typically consists of class meta data, compiled code, threads and GC data. GC data is variable while rest of it should remain static for most applications. This memory is native (yes including the meta space), and JVM uses available memory on host to grow or garbage collect this data.

I would encourage you to read this excellent blog post by Mikhail to get better perspective.

Coming back to the problem at hand, JVM took 500 MB more because the underlying host had 16 GB memoy. At times this number could go higher than the buffer we had set, which would cause the container to be terminated. Shouldn’t the JVM be reading docker container’s memory limit?

Containers and Java

It turns out Java versions 9 and below do not understand containers/dockers at all (by default). It picks up available CPUs and Memory from the underlying host. Each Java app running on a host inside container relies on the host configuration. Considering that we are Kubernetes and many pods run on single node, this could lead to problems like the ones we are facing.

Java 10 supports containers out of the box, and it will lookup the linux cgroup information. This allows the JVM to garbage collect based on containers limits. This is turned on by default using the flag.

-XX:+UseContainerSupport

Thankfully, some of these features have been backported to 8u131 and 9 onwards. They can be turned on using the following flags.

-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap

Summary

Older versions of Java read the underlying host, and don’t understand cgroups. This would cause mismatch between container configuration and the Java process. This mismatch is on both CPU and the memory. Java has an Off heap memory component which has a dynamic GC data component, which could grow. Best way to solve this is to use the container support features available in recent versions of Java. Do not rely on buffering (It is waste of money).

Upgrade to Java 8u131+ or Java 9, if you have to remain on these major versions and turn on the experimental flags. Even better, if you can get all the container love available Java 10 onwards.

References

https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html

https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/

--

--