How to reduce your JVM app memory footprint in Docker and Kubernetes

Natan Silnitsky
Apr 12 · 6 min read
Image for post
Image for post
Photo by Franck V. on Unsplash

Recently, I managed to dramatically reduce the memory usage of a widely-used JVM app container on Kubernetes and save a lot of money. I figured out which JVM flags are more important, how to set them correctly and how to easily measure the impact of my actions on various parts of the app’s memory usage. Here are my Cliff notes.

The story starts with a use case. I work at Wix, as part of the data-streams team, that is in charge of all our Kafka infrastructure. Recently I was tasked with creating a Kafka client proxy for our Node.js services.

Use Case: Kafka client sidecar with a wasteful memory usage

The idea was to delegate all Kafka-related actions (e.g. produce and consume) from a Node.js app to a separate JVM app. The motivation behind this is that a lot of our own infrastructure for Kafka clients is written in Scala (It’s called greyhound — the open source version can be found here).
With a sidecar, the Scala code doesn’t need to get duplicated in other languages. Only a thin wrapper is needed.

Image for post
Image for post

Once we deployed the sidecar in production, we noticed that it consumes quite a lot of memory.

Image for post
Image for post
Metric used —

As you can see from the table above, the memory footprint for the sidecar (running openjdk 8) alone is 4–5 times bigger than the node-app container when it used to include the kafka library.

I had to understand why and how to reduce it considerably.

Experimenting with production data

I set out to create a test-app that mimics the sidecar of this particular node app in order to be able to freely experiment on it without affecting production. The app contained all the consumers from the production app for the same production topics.

As a way to monitor memory consumption I’ve used metrics such as and exposed from mxbeans inside my application to Prometheus/Grafana but you can also use jconsole or jvisualvm (both come bundled with JDK 8 or above)

First, I tried to understand the impact of each consumer and producer, and of the gRPC client (that calls the node app) and I came to the conclusion that having one more consumer (or one less) does not affect memory footprint in a meaningful way.

JVM Heap Flags

Then, I turned my attention to heap allocation,
There are two important JVM flags that are related to heap allocation (heap memory size on startup) and (maximum heap memory size)

I’ve played around with many different combinations of the two and recorded the resulting container memory usage:

Image for post
Image for post
Container overall used memory with different heap flags

The first conclusion I came to from analysing the data on variations in heap flags was that if is higher than , and you have an app with a high memory pressure, then the allocated memory for heap is almost certainly going to continue to grow up to the limit, causing the container’s overall memory usage to grow as well (see comparison in the charts below).

Image for post
Image for post
Xmx >> Xms

But if is the same as , you can have much more control over the overall memory usage, as the heap will not gradually increase over time (see comparison below).

Image for post
Image for post
Xmx = Xms

The second conclusion I came to from the heap flags data was, that you can lower dramatically as long as you don’t see a significant duration of JVM pauses due to Garbage Collection (GC) — more than 500ms for an extended period of time. I’ve again used Grafana for monitoring GC, but you can also use visualgc or gceasy.io

Image for post
Image for post
benign JVM pause times due to GC

Please be careful with the number you set for — if your application has high variation in the message consuming throughput, your application will be more susceptible to GC storms once your app experiences a big burst of incoming messages.

Kafka related tune-up

Our Greyhound (Kafka) Consumer has an internal message buffer that can get as high as 200 messages. When I reduced the maximum allowed size to 20, I’ve noticed that heap memory usage oscillates on a much narrower band than with size=200 (and also has considerably lower usage overall):

Image for post
Image for post
Heap memory usage pattern when bufferMax=200
Image for post
Image for post
Heap memory usage pattern when bufferMax=20

Of course reducing buffer size means the app will not handle bursts well — so this does not work for high throughput applications. In order to mitigate this I’ve doubled the level of parallelism for greyhound consumer handlers per pod.
I.e., I’ve increased the number of threads that process Kafka messages from 3 to 6. In outlier cases either the app will require more pods, or the max buffer configuration will have to be altered.

Reducing Kafka Consumer fetch.max.bytes from 50M to 5M (to reduce total polled messages size) did not have a noticeable effect on memory footprint. Nor did extracting out the greyhound producer from the sidecar app (It can reside on DaemonSet so it will run on each K8s Node).

Summary — What helped with reducing memory usage

The optimizations i’ve made reduced the container memory usage from 1000M to around 550–600M. Here are the changes that contributed to the lower footprint:

  • Maintain a consistent heap size allocation
    Make equal to
  • Reduce the amount of discarded objects (garbage)
    E.g.
    buffer less Kafka messages
  • A little bit GC goes a Long way
    Continue lowering as long as GC (new Gen+Old Gen) don’t take considerable percentage (0.25% cpu time)

What didn’t help (substantially)

  • Reducing KafkaConsumer’s fetch.max.bytes
  • Removing Kafka producer
  • Switching from gRPC client to Wix’s custom json-RPC client

Future Work

  • Explore if GraalVM native image can help
  • Compare different GC implementations. (I’ve used CMS, but there’s G1)
  • Reduce the number of threads we use when consuming from Kafka by switching to our open-sourced ZIO based version of greyhound.
  • Reduce the allocated memory for each thread (by default each thread is assigned 1MB)

More improvements (and a second blog post) are sure to come.

More information

Docker memory resource limits and a heap of Java — blog post

Memory Footprint of a Java Process-Video from GeekOUT conference

Thank you for reading!
If you’d like to get updates on my experiences with Kafka, Scala, ZIO and the JVM, follow me on Twitter and Medium.

You can also visit my website, where you will find my previous blog posts, talks I gave in conferences and open-source projects I’m involved with.

If anything is unclear or you want to point out something, please comment down below.

Wix Engineering

Architecture, scaling, mobile and web development…

Natan Silnitsky

Written by

Backend Infrastructure Developer @Wix.com

Wix Engineering

Architecture, scaling, mobile and web development, management and more, written by our very own Wix engineers. https://www.wix.engineering/

Natan Silnitsky

Written by

Backend Infrastructure Developer @Wix.com

Wix Engineering

Architecture, scaling, mobile and web development, management and more, written by our very own Wix engineers. https://www.wix.engineering/

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store