I recently worked on a tricky issue where our application was terminated ungracefully twice in one week after being enabled in a high-volume production environment. All signs pointed to some sort of memory issue but there was little concrete data to assist in identifying the root cause. What followed was a couple of weeks of bewildering frustration, an existential crisis or two and a hard-earned lesson with a satisfactory conclusion. If you want to hear about my journey to getting to the bottom of the problem, how you can do it in your own applications and the steps to take to avoid it in the first place then read on.
Running Out of Memory
Our application was shut down twice in one week and it was clear from our monitoring that the JVM heap usage spiked in the moments prior to termination however we never got close to full heap usage. At the time we were running our JVM with
Xmx=45G on a VM with 60GB memory available. The other processes on the VM had very small memory footprints so we met the local recommendation of allowing 20% memory overhead above heap size to allow for the JVM’s use of native memory and any performance testing done had not indicated we needed anything more than this. Since we were not getting an
OutOfMemoryException in the application logs and our metrics were being polled only once a minute we postulated that on both occasions it was possible that the logs hadn’t been flushed and the metrics hadn’t been polled to capture the extent of the issue. In an effort to mitigate this we enabled the JVM flag
XX:HeapDumpPath. In the meantime we continued investigating possible causes.
We had alerting and charts in place for heap usage, however we were not monitoring the memory at the server/VM level and when we looked at the server memory we could see that used memory spiked to just below 100% right before the application terminated but heap usage was still reasonably low at ~60%. To figure out what was happening I pulled all log files from
/var/log and started scanning through them since our own application logs had not yielded anything useful. That was when I came across the two lines below.
kernel: [28244264.657076] Out of memory: Kill process 28854 (java) score 977 or sacrifice childkernel: [28244264.657146] Killed process 28854, UID 500, (java) total-vm:138204008kB, anon-rss:128965112kB, file-rss:144kB rss:0kB
The Linux OOM-Killer
Linux has a feature called the OOM-Killer and its job is to save the system when it gets into a low-memory situation. It does this by examining the memory usage of the running processes and choosing one to kill so it can free up memory. It applies an algorithm which scores each process and then it kills the highest-scoring process. This means that if you are running one main application on a server (especially if that application is running in the JVM with a large heap) then your application is quite likely to be the process chosen to be sacrificed. In the log excerpt above you can see that process 28854 was given a score of 977 which means the process held roughly 97.7% of the available memory on the VM. The
rss (resident set size — memory currently held) value for this process was 128965112kB. So our application had ~98% of available memory on the box, something needed more memory (probably our application) and so the OOM-Killer decided to kill PID 28854 to free up memory for the VM. That explains why our app was terminated, next I’ll delve into how I analysed our application’s native memory.
What exactly is Native Memory?
Native memory is memory allocated by the OS on behalf of the Java process through C code. This C code is called by
native methods in certain Java classes. I’ll discuss this C code, where it is and how you can see it in the next section.
A lot of the time you can get away with not thinking too much about how Java uses memory, especially native (off-heap) memory. Since profiling tools generally just ignore native memory because they can’t monitor it, even if you’re being a good citizen and considering the memory usage of your changes as you are developing features or fixing bugs, it’s easy to allow native memory utilisation to just go unchecked. When you get a heap dump from a Java process, you only get the contents of the heap, as the name implies. You will not get any real insight into what is consuming native memory unless the tool you’re using has the ability to tell you that certain heap allocations may result in significant native memory usage too (shout out to jxray).
Where is this C code?
One area of the Java language that makes use of C code is the
java.util.zip package. The classes in this package have native methods such as those shown below.
There is no Java implementation of these methods, the implementation is written in C code and is compiled into the JDK. You can see the source of these methods in the JDK source code. The native method definition in the Java source links to the C code by using the following name mapping.
The native method
deflateBytes in class
java.util.zip.Deflater will have a C implementation called
Java_java_util_zip_Deflater_deflateBytes. You can see the native implementation of this method here on line 118. All memory allocations done by these native methods will go via the library function
malloc and this is what we need to somehow instrument to understand where native memory allocation is happening.
Finding the source of the memory allocation
Through a lot of research of the available tools and some experimentation, the tool I found most useful and the one which ultimately allowed me to find the root cause of our memory problem was jemalloc along with several additional resources such as Jeff Griffith’s GitHub page and this tech blog by the British Government.
jemalloc was configured to be loaded by our application and to take snapshots of native memory allocations at 2GB intervals. This allowed me to run my application with instances of the specific job type that appeared to cause the issue and then generate some PDF files that showed where the native memory was going.
jemalloc does this by replacing the system
malloc implementation with its own and then intercepting and tracking every request for memory. This allows it to track the native memory that Java consumes when it calls native code.
Having run the test above with
jemalloc the results from the performance test indicated a clear problem.
You can see from the screenshot of the
jeprof report above that
deflateInit2_ had allocated 49.5% of the memory currently held by the process and this in turn was called by
Java_java_util_zip_Deflater_init shown below.
To find where this class was being created in my application I reran the test and this time ran a shell script to take a thread dump every second for the duration of the test into a single file. When the test completed I used
grep on the output file to find any instances of
Deflater in the stack traces and that led me to where we were instantiating this class and thereby making native calls to allocate native memory.
Once I found where the
Deflater was being created I needed to identify whether there was a bug or if our native memory usage was legitimate.
Below is a snippet showing how we were creating the
Deflater instances and this is all there was to the bug.
The problem here is in the creation of the
Deflater instance in the constructor of the
DeflaterOutputStream instance and it’s a very (very very very) easy mistake to make. The issue is that we were creating a resource — the
Deflater instance — which we are responsible for closing/cleaning up and instead we assumed that the
DeflaterOutputStream was going to do it for us, which it doesn’t. This same issue will happen with similar classes such as
DeflaterOutputStream has the package-private boolean field
usesDefaultDeflater which is set to
false by default. If you use one of the constructors where you do not pass in an instance of
Deflater then this field is overridden to
true, the simplest example of this where you only provide an
OutputStream parameter is shown below.
Since in our case we wanted to use a custom buffer size, we had to use the following constructor instead.
Using this constructor means the default value for
false is in effect. This alters how the memory of the
Deflater instance is disposed of when the
DeflaterOutputStream is closed. See the
close() method for
end() is not called on the
close() is called on
DeflaterOutputStream. So what happens?
Deflater has a
finalize() method. Since we don’t keep a reference to the
Deflater instance, when the
DeflaterOutputStream is done with it it should be eligible for garbage collection, however since it has a
finalize() method it gets some special treatment from the JVM which I’ll go into more in the next section.
The diagram below illustrates the lifecycle of an object whose class has a nontrivial finalizer
If we number these phases 1–5, then when
DeflaterOutputStream is closed and garbage collected the
Deflater instance moves from phase 1 to phase 2. The finalization queue is processed by a single thread in the JVM and that thread has a lower priority than other threads. As a result the finalizer thread may be slow to process the queue. Other factors may contribute to a backlog of objects consuming native memory too such as
- objects that reference other object that need to be finalized have not been GC’d yet
- some items in the finalization queue take a long time to finalize
- items are being added to the finalization queue at a higher rate than they are being processed
In our case we had all three of the conditions above in our application.
The fix for this was so simple and so impactful it was hard to believe it actually worked as well as it did. See the before and after snippets below.
An alternative solution could be to do what
GZIPOutputStream does which is to subclass
DeflaterOutputStream and set
The result of this change was widespread as the fix was in a library used by several platforms in our company. Below shows a chart from this week where this fix was rolled out, you can see that there was a drop of ~50GB per server instance and much more stable memory utilisation. Sometimes it’s the small changes that make the most impact 🙂
The Lessons Learned
The main lesson learned here was that having monitoring in place for off-heap memory usage is just as important as monitoring heap usage. If this was in place we might have spotted this earlier and been able to address it before it caused an issue in production. This bug was in the code for a couple of years before it bit us and all that time it was quietly eating up the memory on our servers until we hit the physical (or logical in the case of a VM) limit of available RAM and the OOM-Killer stepped in.
I also learned how standard profiling tools often give you a false sense of security as our initial investigations where we took heap dumps from running servers seemed to indicate we had no significant memory issues. So checking the
rss value for running processes instead of just checking the used heap values is definitely part of my troubleshooting steps from now on.