Native Memory — The Silent JVM Killer

Conor Griffin
Dec 18, 2019 · 9 min read
What it looks like when your app gets terminated twice in one week by the Linux oom-killer

I recently worked on a tricky issue where our application was terminated ungracefully twice in one week after being enabled in a high-volume production environment. All signs pointed to some sort of memory issue but there was little concrete data to assist in identifying the root cause. What followed was a couple of weeks of bewildering frustration, an existential crisis or two and a hard-earned lesson with a satisfactory conclusion. If you want to hear about my journey to getting to the bottom of the problem, how you can do it in your own applications and the steps to take to avoid it in the first place then read on.

Running Out of Memory

Our application was shut down twice in one week and it was clear from our monitoring that the JVM heap usage spiked in the moments prior to termination however we never got close to full heap usage. At the time we were running our JVM with Xms=500M and Xmx=45G on a VM with 60GB memory available. The other processes on the VM had very small memory footprints so we met the local recommendation of allowing 20% memory overhead above heap size to allow for the JVM’s use of native memory and any performance testing done had not indicated we needed anything more than this. Since we were not getting an OutOfMemoryException in the application logs and our metrics were being polled only once a minute we postulated that on both occasions it was possible that the logs hadn’t been flushed and the metrics hadn’t been polled to capture the extent of the issue. In an effort to mitigate this we enabled the JVM flag XX:+HeapDumpOnOutOfMemoryError and XX:HeapDumpPath. In the meantime we continued investigating possible causes.

We had alerting and charts in place for heap usage, however we were not monitoring the memory at the server/VM level and when we looked at the server memory we could see that used memory spiked to just below 100% right before the application terminated but heap usage was still reasonably low at ~60%. To figure out what was happening I pulled all log files from /var/log and started scanning through them since our own application logs had not yielded anything useful. That was when I came across the two lines below.

kernel: [28244264.657076] Out of memory: Kill process 28854 (java) score 977 or sacrifice childkernel: [28244264.657146] Killed process 28854, UID 500, (java) total-vm:138204008kB, anon-rss:128965112kB, file-rss:144kB rss:0kB

The Linux OOM-Killer

Linux has a feature called the OOM-Killer and its job is to save the system when it gets into a low-memory situation. It does this by examining the memory usage of the running processes and choosing one to kill so it can free up memory. It applies an algorithm which scores each process and then it kills the highest-scoring process. This means that if you are running one main application on a server (especially if that application is running in the JVM with a large heap) then your application is quite likely to be the process chosen to be sacrificed. In the log excerpt above you can see that process 28854 was given a score of 977 which means the process held roughly 97.7% of the available memory on the VM. The rss (resident set size — memory currently held) value for this process was 128965112kB. So our application had ~98% of available memory on the box, something needed more memory (probably our application) and so the OOM-Killer decided to kill PID 28854 to free up memory for the VM. That explains why our app was terminated, next I’ll delve into how I analysed our application’s native memory.

What exactly is Native Memory?

Native memory is memory allocated by the OS on behalf of the Java process through C code. This C code is called by native methods in certain Java classes. I’ll discuss this C code, where it is and how you can see it in the next section.

A lot of the time you can get away with not thinking too much about how Java uses memory, especially native (off-heap) memory. Since profiling tools generally just ignore native memory because they can’t monitor it, even if you’re being a good citizen and considering the memory usage of your changes as you are developing features or fixing bugs, it’s easy to allow native memory utilisation to just go unchecked. When you get a heap dump from a Java process, you only get the contents of the heap, as the name implies. You will not get any real insight into what is consuming native memory unless the tool you’re using has the ability to tell you that certain heap allocations may result in significant native memory usage too (shout out to jxray).

Where is this C code?

One area of the Java language that makes use of C code is the java.util.zip package. The classes in this package have native methods such as those shown below.

There is no Java implementation of these methods, the implementation is written in C code and is compiled into the JDK. You can see the source of these methods in the JDK source code. The native method definition in the Java source links to the C code by using the following name mapping.

The native method deflateBytes in class java.util.zip.Deflater will have a C implementation called Java_java_util_zip_Deflater_deflateBytes. You can see the native implementation of this method here on line 118. All memory allocations done by these native methods will go via the library function malloc and this is what we need to somehow instrument to understand where native memory allocation is happening.

Finding the source of the memory allocation

Through a lot of research of the available tools and some experimentation, the tool I found most useful and the one which ultimately allowed me to find the root cause of our memory problem was jemalloc along with several additional resources such as Jeff Griffith’s GitHub page and this tech blog by the British Government.

jemalloc was configured to be loaded by our application and to take snapshots of native memory allocations at 2GB intervals. This allowed me to run my application with instances of the specific job type that appeared to cause the issue and then generate some PDF files that showed where the native memory was going. jemalloc does this by replacing the system malloc implementation with its own and then intercepting and tracking every request for memory. This allows it to track the native memory that Java consumes when it calls native code.

Having run the test above with jemalloc the results from the performance test indicated a clear problem.

49.4% of memory was allocated by one native method

You can see from the screenshot of the jeprof report above that deflateInit2_ had allocated 49.5% of the memory currently held by the process and this in turn was called by Java_java_util_zip_Deflater_init shown below.

Native method called by java.util.zip.Deflater

To find where this class was being created in my application I reran the test and this time ran a shell script to take a thread dump every second for the duration of the test into a single file. When the test completed I used grep on the output file to find any instances of Deflater in the stack traces and that led me to where we were instantiating this class and thereby making native calls to allocate native memory.

The Bug

Once I found where the Deflater was being created I needed to identify whether there was a bug or if our native memory usage was legitimate.

Below is a snippet showing how we were creating the Deflater instances and this is all there was to the bug.

The problem here is in the creation of the Deflater instance in the constructor of the DeflaterOutputStream instance and it’s a very (very very very) easy mistake to make. The issue is that we were creating a resource — the Deflater instance — which we are responsible for closing/cleaning up and instead we assumed that the DeflaterOutputStream was going to do it for us, which it doesn’t. This same issue will happen with similar classes such as DeflaterInputStream, InflaterOutputStream and InflaterInputStream too.

Internally DeflaterOutputStream has the package-private boolean field usesDefaultDeflater which is set to false by default. If you use one of the constructors where you do not pass in an instance of Deflater then this field is overridden to true, the simplest example of this where you only provide an OutputStream parameter is shown below.

Since in our case we wanted to use a custom buffer size, we had to use the following constructor instead.

Using this constructor means the default value for usesDefaultDeflater of false is in effect. This alters how the memory of the Deflater instance is disposed of when the DeflaterOutputStream is closed. See the close() method for DeflaterOutputStream below.

So, end() is not called on the Deflater when close() is called on DeflaterOutputStream. So what happens?

Well, Deflater has a finalize() method. Since we don’t keep a reference to the Deflater instance, when the DeflaterOutputStream is done with it it should be eligible for garbage collection, however since it has a finalize() method it gets some special treatment from the JVM which I’ll go into more in the next section.

Finalization

The diagram below illustrates the lifecycle of an object whose class has a nontrivial finalizer

If we number these phases 1–5, then when DeflaterOutputStream is closed and garbage collected the Deflater instance moves from phase 1 to phase 2. The finalization queue is processed by a single thread in the JVM and that thread has a lower priority than other threads. As a result the finalizer thread may be slow to process the queue. Other factors may contribute to a backlog of objects consuming native memory too such as

  • objects that reference other object that need to be finalized have not been GC’d yet
  • some items in the finalization queue take a long time to finalize
  • items are being added to the finalization queue at a higher rate than they are being processed

In our case we had all three of the conditions above in our application.

The Fix

The fix for this was so simple and so impactful it was hard to believe it actually worked as well as it did. See the before and after snippets below.

Deflater instance won’t release memory until processed by finalizer thread
Deflater instance will release memory when the DeflaterOutputStream is closed

An alternative solution could be to do what GZIPOutputStream does which is to subclass DeflaterOutputStream and set usesDefaultDeflater to true.

This is how GZIPOutputStream achieves the same thing by subclassing DeflaterOutputStream

The Results

The result of this change was widespread as the fix was in a library used by several platforms in our company. Below shows a chart from this week where this fix was rolled out, you can see that there was a drop of ~50GB per server instance and much more stable memory utilisation. Sometimes it’s the small changes that make the most impact 🙂

The Lessons Learned

The main lesson learned here was that having monitoring in place for off-heap memory usage is just as important as monitoring heap usage. If this was in place we might have spotted this earlier and been able to address it before it caused an issue in production. This bug was in the code for a couple of years before it bit us and all that time it was quietly eating up the memory on our servers until we hit the physical (or logical in the case of a VM) limit of available RAM and the OOM-Killer stepped in.

I also learned how standard profiling tools often give you a false sense of security as our initial investigations where we took heap dumps from running servers seemed to indicate we had no significant memory issues. So checking the rss value for running processes instead of just checking the used heap values is definitely part of my troubleshooting steps from now on.

The Startup

Get smarter at building your thing. Join The Startup’s +786K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Conor Griffin

Written by

Software Engineer at Workday

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +786K followers.

Conor Griffin

Written by

Software Engineer at Workday

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +786K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app