How Trendyol JVM applications consume less memory in the production environment?
In Trendyol we have a lot of microservices. To improve system performance, we need to optimize the system memory, cpu and startup time. Our aim to tune JVM for making an application have a larger throughput at the lowest cost of hardware consumption. Think about you have more than 200 microservices and each microservice consumes 2GB memory for one Kubernetes pod, it is not a container friendly microservice solution after that we decide to try alternative JVM runner. Actually all of the runners have some pros and cons what do you expect from them. Memory, fast startup, fast matrix multiplication or all of them. In this article, we will give some experience with OpenJ9.
Eclipse OpenJ9 is a high performance, scalable, Java virtual machine (JVM) implementation that is fully compliant with the Java Virtual Machine Specification
Eclipse OpenJ9 is a high performance, scalable, Java virtual machine (JVM) implementation that is fully compliant with…
OpenJ9 is a different JVM implementation from the default Oracle HotSpot.
We have changed some behavior of OpenJ9 for our microservice requirements, for instance;
● Thread stack size settings
Sets the stack size and increment for Java™ threads. If you exceed the maximum Java thread stack size, a…
● Disable Attach API
● Enable Huge page in Linux
(Linux® systems only (x86, POWER®, and IBM Z®) If Transparent Huge Pages (THP) is set to madvise on your system, this…
● Avoid String sharing by String.substring()
Why we choose OpenJ9 instead of Hotspot?
● Fast startup
● Minimal memory footprint
● Minimal CPU usage
● No resource usage when idle
● More RPM
How to tune for the fast startup?
1. Class data sharing
Sharing class data between JVMs improves startup performance and reduces the memory footprint.
Startup performance is improved by placing classes that an application needs when initializing into a shared classes cache. The next time the application runs, it takes much less time to start because the classes are already available. When you enable class data sharing, AOT compilation is also enabled by default, which dynamically compiles certain methods into AOT code at runtime. By using these features in combination, startup performance can be improved even further because the cached AOT code can be used to quickly enable native code performance for subsequent runs of your application.
When class data sharing is enabled, OpenJ9 automatically creates a memory mapped file that stores and shares the classes in memory.
The class sharing feature of Eclipse Openj9 can be enabled by running the JVM with the
-Xshareclasses option specified on the command line. Bootstrap classes, extension classes, and application classes, as well as Ahead of Time (AOT) compiled code, are stored into a class cache that can be shared across multiple JVMs. There is a lot of way to cache your application classes Here you are.
Using a Docker volume cache
1.Create docker volume
docker volume create product-service-cache
2. Run Java process
ENTRYPOINT ["java", "-Xshareclasses:cacheDir=/cache", "-Xscmx300M"]
3. Mount docker volume
docker run - mount source=produt-service-cache,target=/cache product-service-cache
Pre-warming cache the Docker container
You can provide a cache layer for docker after that when JVM starts to use the docker layer cache.
RUN /bin/bash -c 'java -Xscmx80M -Xshareclasses:name=productdetailapi -Xquickstart -jar /app/product-detail-api.jar &' ; sleep 30 ; pkill -9 -f product-detail-api
You can use
java -Xshareclasses:name=<name>,printStats=classpath to find the fat jar on the classpath.
1: 0x00007F771E726B74 CLASSPATH
1: 0x00007F771E156804 CLASSPATH
1: 0x00007F771E154998 CLASSPATH
1: 0x00007F771E14843C CLASSPATH
2: 0x00007F771CAE7954 CLASSPATH
Current statistics for cache "segmentapi":
Cache created with:
-Xnolinenumbers = false
BCI Enabled = true
Restrict Classpaths = false
Feature = cr
Cache contains only classes with line numbers
base address = 0x00007F770D459000
end address = 0x00007F7720000000
allocation pointer = 0x00007F770F1DE3A8
cache layer = 0
cache size = 314572192
softmx bytes = 67108864
free bytes = 1876
Reserved space for AOT bytes = -1
Maximum space for AOT bytes = -1
Reserved space for JIT data bytes = -1
Maximum space for JIT data bytes = -1
Metadata bytes = 1346824
Metadata % used = 2%
Class debug area size = 25133056
Class debug area used bytes = 4657892
Class debug area % used = 18%
ROMClass bytes = 30954408
AOT bytes = 28346336
JIT data bytes = 489368
Zip cache bytes = 947496
Startup hint bytes = 120
Data bytes = 363936
stale bytes = 17159000
# ROMClasses = 13489
# AOT Methods = 6361
# Classpaths = 18
# URLs = 0
# Tokens = 0
# Zip caches = 23
# Startup hints = 1
# Stale classes = 7166
% Stale classes = 53%
Cache is 99% soft full
Cache is accessible to current user = true
Some Tips for class cache
Use -Xshareclasses:listAllCaches =>find the default shared cache.Use -Xshareclasses:printStats => show the cache statistics.Use java -Xshareclasses:destroy => delete all caches.Use java -Xshareclasses:cacheDir =>specific cache directory.Use java -Xshareclasses:name =>connects to a cache of a given name, creating the cache if it does not exist. if you don't have cache name, it is always create new cache.
Class data sharing solution for Kubernetes
You can use Persistent Volumes to store your class cache.
SCC Production Benchmark
OpenJ9 Without SCC (Spring Boot ProductDetailApiApplication)
# initial run
Started ProductDetailApiApplication in 5.058 seconds (JVM running for 5.81)
# subsequent runs
Started ProductDetailApiApplication in 5.506 seconds (JVM running for 6.341)
Started ProductDetailApiApplication in 5.497 seconds (JVM running for 6.057)
Started ProductDetailApiApplication in 5.287 seconds (JVM running for 6.057)
Started ProductDetailApiApplication in 4.9 seconds (JVM running for 5.816)
OpenJ9 With SCC (Spring Boot ProductDetailApiApplication)
# initial run
Started ProductDetailApiApplication in 1.801 seconds (JVM running for 2.053)# subsequent runs
Started ProductDetailApiApplication in 1.704 seconds (JVM running for 2.128)
Started ProductDetailApiApplication in 1.544 seconds (JVM running for 1.9)
Started ProductDetailApiApplication in 1.473 seconds (JVM running for 1.928)
Started ProductDetailApiApplication in 1.504 seconds (JVM running for 1.842)
OpenJ9 With SCC (Reactive Undertow Native ProductDetailApiApplication)
# initial run
Server started in ms: 843# subsequent runs
Server started in ms: 626
Server started in ms: 570
Server started in ms: 470
Server started in ms: 577
OpenJ9 With SCC (Quarkus ProductDetailApiApplication)
# initial run
Server started in ms: 450# subsequent runs
Server started in ms: 312
Server started in ms: 290
SCC improves startup performance and reduces memory footprint.
2. Lazy class relationship
Java bytecode verification entails several processes, one of which is class relationship verification. Java Virtual Machine (JVM) startup includes any JVM or application setup preceding the actual execution of the program, and one of these steps is verification.
Class files are compiles JVM language source files. Each class file contains Java bytecodes which define a class, interface. The JVM loads these files and executes the bytecodes. Before this process, JVM makes verification for each file. Which class linked each other? It is a long live operation for the JVM process.
Lazy verification doing some of the class loadings during verification can be an effective technique to improve startup time performance. If you have a monolithic application that has more then 10k+ application classes it will dramatically improve your startup time. It still makes class relation verification. (
-Xverify:none not same option)
Enable it with Java Environment
This option causes the JIT compiler to run with little optimizations, which can improve the performance of short-running applications. Use these settings for the serverless application, desktop, and GUI applications. Don’t use it for the long-running applications.
When the AOT compiler is active (both shared classes and AOT compilation enabled),
-Xquickstart it causes all methods to be AOT compiled. The AOT compilation improves the startup time of subsequent runs, but might reduce performance for long-running applications.
Cold compiles do cheap optimizations that run quickly. Warm compile more complicated optimizations and occasionally loop over some sets of optimizations. At hot, do more expensive optimizations into the mix and tend to iterate more to try to catch more opportunities.
Optimizes for virtualized environments by reducing OpenJ9 VM CPU consumption when idle.
- Idle state detection mechanism.
- Free garbage in the heap.
- Improves start-up and ramp-up. The trade-off of small throughput loss starting.
5. JIT as a Service
Openj9 has decoupled the JIT compiler from the JVM and made it run in its own independent process. This process can be managed by the JITServer, can easily be containerized and run in the cloud as a service.
● No memory spikes from JIT optimizations.
● Less cpu usage.
● No more cpu spikes from JIT and fast startup.
● Maybe first request optimization.
6. LUDCL caching
The deserialization process makes faster than normal.
With these optimizations no more class loop and its improvement application throughput. If you have a complex model you will gain more and more.
In OpenJ9 there are two optimizations at work:
- Class caching: create a
java.io.ClassCacheto reduce calls to
java.lang.Class.forNamefor repeated lookups
- Cache “LUDCL”: The loader can be safely cached while in the
ObjectInputStreamclass. If custom
readObjectmethods are invoked during this process the LUDCL will need to be refreshed.
- JIT replacing
ObjectInputStream.readObject: To eliminate another LUDCL retrieval, the JIT will replace
ObjectInputStream.redirectedReadObject(ObjectInputStream iStream, Class caller).
ObjectInputStream.redirectedReadObjectwill provide the LUDCL information through an argument preventing extra calls to LUDCL.
7. Thread Stack Size
OpenJDK stack sizes are fixed and do not shrink or grow. Depending on the platform, default sizes vary.
Openj9 stack sizes are limited via a lower and upper boundary for Java Threads. OS threads can have a different stack size.
Change this setting to your application behavior. OpenJ9 initial Java thread stack size 2KB and Hotspot is 1MB default for x64 system.
Some Basic Benchmarks
Undertow + Rest + Couchbase Client + HotSpot
Initial Memory Usage = 112 MB
Undertow + Rest + Couchbase Client + OpenJ9
Initial Memory Usage = 38.5 MB
● 40 % faster startup time.
● 73MB less memory footprint
In the production system, we handle more than 4M rpm with low latency and memory footprint. Try to the tune application runtime with different JDK. These are our spring boot microservice projects in the production environment.
Quarkus microservice in the production environment. Average memory consumption 90mb memory with OpenJ9.
● JSR292 in the form of a lambda operation is slower than the Hotspot micro level. (https://github.com/eclipse/openj9/issues/4837)
● Slow for XML parsing operation
● Need more community
There are a lot of JDK for the Java ecosystem, Oracle JDK, AdoptOpenJDK, Azul, OpenJ9, Corretto, OpenJDK, Dragonwell8, GraalVM. Find the best virtual machine runner for your system and share it with us!
Special thanks to Mark Stoodley.
Eclipse OpenJ9; not just any Java Virtual Machine | The Eclipse Foundation
OpenJ9 is a JVM implementation, but not just any JVM! Although the Eclipse OpenJ9 project hasn't been around for very…