How to reduce your JVM app memory footprint in Docker and Kubernetes
Recently, I managed to dramatically reduce the memory usage of a widely-used JVM app container on Kubernetes and save a lot of money. I figured out which JVM flags are more important, how to set them correctly and how to easily measure the impact of my actions on various parts of the app’s memory usage. Here are my Cliff notes.
The story starts with a use case. I work at Wix, as part of the data-streams team, that is in charge of all our Kafka infrastructure. Recently I was tasked with creating a Kafka client proxy for our Node.js services.
Use Case: Kafka client sidecar with a wasteful memory usage
The idea was to delegate all Kafka-related actions (e.g. produce and consume) from a Node.js app to a separate JVM app. The motivation behind this is that a lot of our own infrastructure for Kafka clients is written in Scala (It’s called greyhound — the open source version can be found here).
With a sidecar, the Scala code doesn’t need to get duplicated in other languages. Only a thin wrapper is needed.
Once we deployed the sidecar in production, we noticed that it consumes quite a lot of memory.
As you can see from the table above, the memory footprint for the sidecar (running openjdk 8) alone is 4–5 times bigger than the node-app container when it used to include the kafka library.
I had to understand why and how to reduce it considerably.
Experimenting with production data
I set out to create a test-app that mimics the sidecar of this particular node app in order to be able to freely experiment on it without affecting production. The app contained all the consumers from the production app for the same production topics.
As a way to monitor memory consumption I’ve used metrics such as
nonHeapMemoryUsed exposed from mxbeans inside my application to Prometheus/Grafana but you can also use jconsole or jvisualvm (both come bundled with JDK 8 or above)
First, I tried to understand the impact of each consumer and producer, and of the gRPC client (that calls the node app) and I came to the conclusion that having one more consumer (or one less) does not affect memory footprint in a meaningful way.
JVM Heap Flags
Then, I turned my attention to heap allocation,
There are two important JVM flags that are related to heap allocation
-Xms (heap memory size on startup) and
-Xmx (maximum heap memory size)
I’ve played around with many different combinations of the two and recorded the resulting container memory usage:
The first conclusion I came to from analysing the data on variations in heap flags was that if
Xmx is higher than
Xms, and you have an app with a high memory pressure, then the allocated memory for heap is almost certainly going to continue to grow up to the
Xmx limit, causing the container’s overall memory usage to grow as well (see comparison in the charts below).
Xmx is the same as
Xms, you can have much more control over the overall memory usage, as the heap will not gradually increase over time (see comparison below).
The second conclusion I came to from the heap flags data was, that you can lower
Xmx dramatically as long as you don’t see a significant duration of JVM pauses due to Garbage Collection (GC) — more than 500ms for an extended period of time. I’ve again used Grafana for monitoring GC, but you can also use visualgc or gceasy.io
Please be careful with the number you set for
Xmx — if your application has high variation in the message consuming throughput, your application will be more susceptible to GC storms once your app experiences a big burst of incoming messages.
Kafka related tune-up
Our Greyhound (Kafka) Consumer has an internal message buffer that can get as high as 200 messages. When I reduced the maximum allowed size to 20, I’ve noticed that heap memory usage oscillates on a much narrower band than with size=200 (and also has considerably lower usage overall):
Of course reducing buffer size means the app will not handle bursts well — so this does not work for high throughput applications. In order to mitigate this I’ve doubled the level of parallelism for greyhound consumer handlers per pod.
I.e., I’ve increased the number of threads that process Kafka messages from 3 to 6. In outlier cases either the app will require more pods, or the max buffer configuration will have to be altered.
Reducing Kafka Consumer fetch.max.bytes from 50M to 5M (to reduce total polled messages size) did not have a noticeable effect on memory footprint. Nor did extracting out the greyhound producer from the sidecar app (It can reside on DaemonSet so it will run on each K8s Node).
Summary — What helped with reducing memory usage
The optimizations i’ve made reduced the container memory usage from 1000M to around 550–600M. Here are the changes that contributed to the lower footprint:
- Maintain a consistent heap size allocation
- Reduce the amount of discarded objects (garbage)
E.g. buffer less Kafka messages
- A little bit GC goes a Long way
xmxas long as GC (new Gen+Old Gen) don’t take considerable percentage (0.25% cpu time)
What didn’t help (substantially)
- Reducing KafkaConsumer’s fetch.max.bytes
- Removing Kafka producer
- Switching from gRPC client to Wix’s custom json-RPC client
- Explore if GraalVM native image can help
- Compare different GC implementations. (I’ve used CMS, but there’s G1)
- Reduce the number of threads we use when consuming from Kafka by switching to our open-sourced ZIO based version of greyhound.
- Reduce the allocated memory for each thread (by default each thread is assigned 1MB)
More improvements (and a second blog post) are sure to come.
Thank you for reading!
If you’d like to get updates on my experiences with Kafka, Scala, ZIO and the JVM, follow me on Twitter and Medium.
You can also visit my website, where you will find my previous blog posts, talks I gave in conferences and open-source projects I’m involved with.
If anything is unclear or you want to point out something, please comment down below.