JVM on Kubernetes: Avoiding Pitfalls in Resource Configuration
Running JVM applications in a container environment requires a knowledge of both JVM memory management and Kubernetes resource mechanics. When these two collaborate seamlessly, they ensure a stable application. However, misconfigurations can result in infrastructure overspending at best and a crashing application at worst.
Over the years, I’ve witnessed numerous instances of misconfigurations that have resulted in significant issues. The default behavior of the JVM can be problematic when running applications on Kubernetes.
So what’s the problem?
- JVM Heap Memory
If you don’t specify the maximum heap size (-Xmxparameter), JVM sets it automatically to approximately 25% of the available RAM. This value is counted based on the memory visible inside the container, you can already imagine that this is not the best usage of the allocated memory. Moreover, if you don’t set the resource limits
resources:
limits:
memory: xxxthen JVM will see the whole memory of the node and allocate 25% of the whole memory to 1 single pod, which will result in OOM kills once you add more services.
So, always set the memory limit!
Now, even if the limit is set, using only 25% for the heap is not optimal in most of the cases. Typically, a more efficient allocation, closer to 75% is advisable. However, it is crucial to profile each service individually and once you find the approximate amount of heap memory, you can set it via -Xmx parameter, or you can use Paketo Buildpacks, which has a built-in memory calculator. It calculates the -Xmx JVM flag using the formula :
Heap = TotalContainerMemory - NonHeap - Headroom
where:
Non-Heap = DirectMemory + Metaspace + ReservedCodeCache + (ThreadStack * ThreadCount)
Now, you’ve profiled your service, you have your desired heap size, and you set the memory request according to it:
resources:
requests:
memory: "xxx"What is the limit value you should go with?
A countless number of sources claim that the best practice is to set the same value as the limit. Here’s a nice pizza example; Kubernetes maintainers like Tim Hockin recommend the same.
Although, it might sound counterintuitive aka “why they invented request and limit, when the best practice is to set them equal”, but if you want to guarantee a predictable behavior in your Production cluster, this is the way to go.
In development environments, I take full advantage of setting higher memory limits compared to the request. As a result, some services might temporarily utilize the additional available memory, and undergo a restart if another service requires that memory later.
2. CPU limit`
There’s an ongoing battle between stop-using-cpu-limits and why-you-should-keep-using-cpu-limits. Both articles provide very good arguments, but in practice, I haven’t seen any side effects described in the second one, contrary to the arguments for removing the limit:
- CPU, like water is a “renewable” resource. In simple terms, if you have 100% CPU usage at a given minute, that doesn’t “use up” the CPU for the next minute. CPU renews itself from moment to moment.
- Contrary to common belief, removing CPU limits doesn’t lead to interference between pods.
- Defining accurate CPU requests is crucial, ensuring each pod gets the reserved CPU, with the option to use excess CPU if no limit is set.
- Accurate requests ensure fair resource allocation, preventing one service from monopolizing resources and enabling all to survive.
In the context of JVM applications, particularly those built on the Spring Framework, substantial CPU resources are consumed during the startup phase, whereas runtime utilization is only a fraction of this amount. Consequently, adopting the approach of setting the CPU request value based on runtime profiling results and not setting the limit — provides an optimal allocation of CPU resources. This strategy ensures sufficient resources for the service during the startup phase and allows for dynamic reallocation to accommodate start/activity spikes in other services.
