How Trendyol JVM applications consume less memory in the production environment?

Gokhan Karadas
Jan 20, 2020 · 9 min read

In Trendyol we have a lot of microservices. To improve system performance, we need to optimize the system memory, cpu and startup time. Our aim to tune JVM for making an application have a larger throughput at the lowest cost of hardware consumption. Think about you have more than 200 microservices and each microservice consumes 2GB memory for one Kubernetes pod, it is not a container friendly microservice solution after that we decide to try alternative JVM runner. Actually all of the runners have some pros and cons what do you expect from them. Memory, fast startup, fast matrix multiplication or all of them. In this article, we will give some experience with OpenJ9.

Eclipse OpenJ9 is a high performance, scalable, Java virtual machine (JVM) implementation that is fully compliant with the Java Virtual Machine Specification

OpenJ9 is a different JVM implementation from the default Oracle HotSpot.

We have changed some behavior of OpenJ9 for our microservice requirements, for instance;

● Thread stack size settings

● Disable Attach API

https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/openj9/dcomibmtoolsattachenable/index.html

● Enable Huge page in Linux

● Avoid String sharing by String.substring()

Why we choose OpenJ9 instead of Hotspot?

● Fast startup

● Minimal memory footprint

● Minimal CPU usage

● No resource usage when idle

● More RPM

How to tune for the fast startup?

1. Class data sharing

Sharing class data between JVMs improves startup performance and reduces the memory footprint.

Startup performance is improved by placing classes that an application needs when initializing into a shared classes cache. The next time the application runs, it takes much less time to start because the classes are already available. When you enable class data sharing, AOT compilation is also enabled by default, which dynamically compiles certain methods into AOT code at runtime. By using these features in combination, startup performance can be improved even further because the cached AOT code can be used to quickly enable native code performance for subsequent runs of your application.

When class data sharing is enabled, OpenJ9 automatically creates a memory mapped file that stores and shares the classes in memory.

The class sharing feature of Eclipse Openj9 can be enabled by running the JVM with the -Xshareclasses option specified on the command line. Bootstrap classes, extension classes, and application classes, as well as Ahead of Time (AOT) compiled code, are stored into a class cache that can be shared across multiple JVMs. There is a lot of way to cache your application classes Here you are.

Using a Docker volume cache

1.Create docker volume

docker volume create product-service-cache

2. Run Java process

ENTRYPOINT ["java", "-Xshareclasses:cacheDir=/cache", "-Xscmx300M"]

3. Mount docker volume

docker run - mount source=produt-service-cache,target=/cache product-service-cache

Pre-warming cache the Docker container

You can provide a cache layer for docker after that when JVM starts to use the docker layer cache.

RUN /bin/bash -c 'java -Xscmx80M -Xshareclasses:name=productdetailapi -Xquickstart -jar /app/product-detail-api.jar &' ; sleep 30 ; pkill -9 -f product-detail-api

You can use java -Xshareclasses:name=<name>,printStats=classpath to find the fat jar on the classpath.

java -Xshareclasses:cacheDir=/data/browsing-team-cache,name=segmentapi,pr
intStats=classpath
1: 0x00007F771E726B74 CLASSPATH
/opt/java/openjdk/jre/lib/amd64/compressedrefs/jclSC180/vm.jar
/opt/java/openjdk/jre/lib/se-service.jar
/opt/java/openjdk/jre/lib/rt.jar
/opt/java/openjdk/jre/lib/resources.jar
/opt/java/openjdk/jre/lib/jsse.jar
/opt/java/openjdk/jre/lib/charsets.jar
/opt/java/openjdk/jre/lib/jce.jar
/tmp/agent-bridge-datastore4910282294015568663.jar
/tmp/agent-bridge2976904801636233639.jar
/tmp/newrelic-opentracing-bridge3489413472377320446.jar
/tmp/newrelic-api2086183132559506729.jar
/tmp/newrelic-weaver-api2796658590757924607.jar
/tmp/newrelic-bootstrap5528856975107955007.jar
/tmp/instrumentation5835203138211461845.jar
1: 0x00007F771E156804 CLASSPATH
/app/segment-api.jar!/BOOT-INF/classes
1: 0x00007F771E154998 CLASSPATH
/app/segment-api.jar!/BOOT-INF/classes
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-web-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-2.2.1.RELEASE.jar
1: 0x00007F771E14843C CLASSPATH
/app/segment-api.jar!/BOOT-INF/classes
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-web-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-autoconfigure-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-logging-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/log4j-to-slf4j-2.12.1.jar
/app/segment-api.jar!/BOOT-INF/lib/log4j-api-2.12.1.jar
/app/segment-api.jar!/BOOT-INF/lib/jul-to-slf4j-1.7.29.jar
/app/segment-api.jar!/BOOT-INF/lib/jakarta.annotation-api-1.3.5.jar
/app/segment-api.jar!/BOOT-INF/lib/snakeyaml-1.25.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-json-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/jackson-databind-2.10.0.jar
/app/segment-api.jar!/BOOT-INF/lib/jackson-core-2.10.0.jar
/app/segment-api.jar!/BOOT-INF/lib/jackson-datatype-jdk8-2.10.0.jar
/app/segment-api.jar!/BOOT-INF/lib/jackson-datatype-jsr310-2.10.0.jar
/app/segment-api.jar!/BOOT-INF/lib/jackson-module-parameter-names-2.10.0.jar
/app/segment-api.jar!/BOOT-INF/lib/spring-boot-starter-validation-2.2.1.RELEASE.jar
/app/segment-api.jar!/BOOT-INF/lib/jakarta.validation-api-2.0.1.jar
/app/segment-api.jar!/BOOT-INF/lib/hibernate-validator-6.0.18.Final.jar
....
More jars
/opt/java/openjdk/jre/lib/amd64/compressedrefs/jclSC180/vm.jar
/opt/java/openjdk/jre/lib/se-service.jar
/opt/java/openjdk/jre/lib/rt.jar
/opt/java/openjdk/jre/lib/resources.jar
/opt/java/openjdk/jre/lib/jsse.jar
/opt/java/openjdk/jre/lib/charsets.jar
/opt/java/openjdk/jre/lib/jce.jar
/tmp/agent-bridge-datastore8512134162353853874.jar
/tmp/agent-bridge7816995727144126101.jar
/tmp/newrelic-opentracing-bridge5733246240277084059.jar
/tmp/newrelic-api7385479491596146925.jar
/tmp/newrelic-weaver-api3316203145804091576.jar
/tmp/newrelic-bootstrap1409323056764906364.jar
2: 0x00007F771CAE7954 CLASSPATH


Current statistics for cache "segmentapi":

Cache created with:
-Xnolinenumbers = false
BCI Enabled = true
Restrict Classpaths = false
Feature = cr

Cache contains only classes with line numbers

base address = 0x00007F770D459000
end address = 0x00007F7720000000
allocation pointer = 0x00007F770F1DE3A8

cache layer = 0
cache size = 314572192
softmx bytes = 67108864
free bytes = 1876
Reserved space for AOT bytes = -1
Maximum space for AOT bytes = -1
Reserved space for JIT data bytes = -1
Maximum space for JIT data bytes = -1
Metadata bytes = 1346824
Metadata % used = 2%
Class debug area size = 25133056
Class debug area used bytes = 4657892
Class debug area % used = 18%

ROMClass bytes = 30954408
AOT bytes = 28346336
JIT data bytes = 489368
Zip cache bytes = 947496
Startup hint bytes = 120
Data bytes = 363936
stale bytes = 17159000

# ROMClasses = 13489
# AOT Methods = 6361
# Classpaths = 18
# URLs = 0
# Tokens = 0
# Zip caches = 23
# Startup hints = 1
# Stale classes = 7166
% Stale classes = 53%


Cache is 99% soft full

Cache is accessible to current user = true

Some Tips for class cache

Use -Xshareclasses:listAllCaches =>find the default shared cache.Use -Xshareclasses:printStats => show the cache statistics.Use java -Xshareclasses:destroy => delete all caches.Use java -Xshareclasses:cacheDir =>specific cache directory.Use java -Xshareclasses:name =>connects to a cache of a given name, creating the cache if it does not exist. if you don't have cache name, it is always create new cache.

Class data sharing solution for Kubernetes

You can use Persistent Volumes to store your class cache.

{
"volumes": [
{
"name": "browsing-team-cache",
"persistentVolumeClaim": {
"claimName": "browsing-team-cache"
}
}
]
}
{
"volumeMounts": [
{
"name": "browsing-team-cache",
"mountPath": "/data/browsing-team-cache"
}
]
}

SCC Production Benchmark

OpenJ9 Without SCC (Spring Boot ProductDetailApiApplication)

# initial run
Started ProductDetailApiApplication in 5.058 seconds (JVM running for 5.81)

# subsequent runs
Started ProductDetailApiApplication in 5.506 seconds (JVM running for 6.341)
Started ProductDetailApiApplication in 5.497 seconds (JVM running for 6.057)
Started ProductDetailApiApplication in 5.287 seconds (JVM running for 6.057)
Started ProductDetailApiApplication in 4.9 seconds (JVM running for 5.816)

OpenJ9 With SCC (Spring Boot ProductDetailApiApplication)

# initial run
Started ProductDetailApiApplication in 1.801 seconds (JVM running for 2.053)
# subsequent runs
Started ProductDetailApiApplication in 1.704 seconds (JVM running for 2.128)
Started ProductDetailApiApplication in 1.544 seconds (JVM running for 1.9)
Started ProductDetailApiApplication in 1.473 seconds (JVM running for 1.928)
Started ProductDetailApiApplication in 1.504 seconds (JVM running for 1.842)

OpenJ9 With SCC (Reactive Undertow Native ProductDetailApiApplication)

# initial run
Server started in ms: 843
# subsequent runs
Server started in ms: 626
Server started in ms: 570
Server started in ms: 470
Server started in ms: 577

OpenJ9 With SCC (Quarkus ProductDetailApiApplication)

# initial run
Server started in ms: 450
# subsequent runs
Server started in ms: 312
Server started in ms: 290

SCC improves startup performance and reduces memory footprint.

2. Lazy class relationship

Java bytecode verification entails several processes, one of which is class relationship verification. Java Virtual Machine (JVM) startup includes any JVM or application setup preceding the actual execution of the program, and one of these steps is verification.

Class files are compiles JVM language source files. Each class file contains Java bytecodes which define a class, interface. The JVM loads these files and executes the bytecodes. Before this process, JVM makes verification for each file. Which class linked each other? It is a long live operation for the JVM process.

Lazy verification doing some of the class loadings during verification can be an effective technique to improve startup time performance. If you have a monolithic application that has more then 10k+ application classes it will dramatically improve your startup time. It still makes class relation verification. ( -Xverify:none not same option)

Enable it with Java Environment

-XX:+ClassRelationshipVerifier

3. -Xquickstart

This option causes the JIT compiler to run with little optimizations, which can improve the performance of short-running applications. Use these settings for the serverless application, desktop, and GUI applications. Don’t use it for the long-running applications.

When the AOT compiler is active (both shared classes and AOT compilation enabled), -Xquickstart it causes all methods to be AOT compiled. The AOT compilation improves the startup time of subsequent runs, but might reduce performance for long-running applications.

Cold compiles do cheap optimizations that run quickly. Warm compile more complicated optimizations and occasionally loop over some sets of optimizations. At hot, do more expensive optimizations into the mix and tend to iterate more to try to catch more opportunities.

4. -Xtune:virtualized

Optimizes for virtualized environments by reducing OpenJ9 VM CPU consumption when idle.

  1. Idle state detection mechanism.
  2. Free garbage in the heap.
  3. Improves start-up and ramp-up. The trade-off of small throughput loss starting.

5. JIT as a Service

Openj9 has decoupled the JIT compiler from the JVM and made it run in its own independent process. This process can be managed by the JITServer, can easily be containerized and run in the cloud as a service.

● No memory spikes from JIT optimizations.

● Less cpu usage.

● No more cpu spikes from JIT and fast startup.

● Maybe first request optimization.

6. LUDCL caching

The deserialization process makes faster than normal.

With these optimizations no more class loop and its improvement application throughput. If you have a complex model you will gain more and more.

Optimization

In OpenJ9 there are two optimizations at work:

  1. Class caching: create a java.io.ClassCache to reduce calls to java.lang.Class.forName for repeated lookups
  2. Cache “LUDCL”: The loader can be safely cached while in the ObjectInputStream class. If custom readObject methods are invoked during this process the LUDCL will need to be refreshed.
  3. JIT replacing ObjectInputStream.readObject: To eliminate another LUDCL retrieval, the JIT will replace ObjectInputStream.readObject() call with ObjectInputStream.redirectedReadObject(ObjectInputStream iStream, Class caller). ObjectInputStream.redirectedReadObject will provide the LUDCL information through an argument preventing extra calls to LUDCL.

7. Thread Stack Size

OpenJDK stack sizes are fixed and do not shrink or grow. Depending on the platform, default sizes vary.

Openj9 stack sizes are limited via a lower and upper boundary for Java Threads. OS threads can have a different stack size.

Change this setting to your application behavior. OpenJ9 initial Java thread stack size 2KB and Hotspot is 1MB default for x64 system.

Some Basic Benchmarks

Undertow + Rest + Couchbase Client + HotSpot

Initial Memory Usage = 112 MB

Undertow + Rest + Couchbase Client + OpenJ9

Initial Memory Usage = 38.5 MB

Result

● 40 % faster startup time.

● 73MB less memory footprint

In the production system, we handle more than 4M rpm with low latency and memory footprint. Try to the tune application runtime with different JDK. These are our spring boot microservice projects in the production environment.

Quarkus microservice in the production environment. Average memory consumption 90mb memory with OpenJ9.

OpenJ9 Cons:

JSR292 in the form of a lambda operation is slower than the Hotspot micro level. (https://github.com/eclipse/openj9/issues/4837)

Slow for XML parsing operation

Need more community

There are a lot of JDK for the Java ecosystem, Oracle JDK, AdoptOpenJDK, Azul, OpenJ9, Corretto, OpenJDK, Dragonwell8, GraalVM. Find the best virtual machine runner for your system and share it with us!

Special thanks to Mark Stoodley.

References:

Trendyol Tech

Trendyol Tech Team