Moving from Java 8 & CMS to Java 15 & ZGC

Aykut Akıncı
ZyngaTurkey Engineering
5 min readDec 11, 2020

If you are following the news in Java world, you might recall that the release cadence of Java has changed a few years ago. Now a new major Java version is released every six months instead of years. Though many people are still using Java 8, the latest version as of this writing is Java 15. Since the release of Java 8 in 2014 many things have changed and many new features have been added. Here’s a categorized view that shows the new features which have been added since Java 8.

When it comes to garbage collection, the changes since Java 8 are remarkable. First G1 was assigned as the default garbage collector in Java 9. Then Full GC of G1 has been implemented to work in parallel to reduce latency in Java 10. And at the end, new garbage collectors, that offer minimal pause durations for garbage collection have been introduced as experimental features. This was long-awaited by developers who were building latency-critical applications. One of these low pause garbage collectors is the Z Garbage Collector. ZGC claims to drop the pause durations for garbage collection from tens of milliseconds to just a few milliseconds. It also claims that these low latencies would not be affected as the heap grows. All that sounds great, but we wouldn’t dare to use it in production since it was still experimental. The good news is here and ZGC is now production-ready in Java 15.

So, we decided to switch to Java 15 and ZGC to see if its claims hold for our own application. There is indeed great benchmark tests on garbage collectors, such as this, but every system is a bit different and we had to conduct our own. Still, there might be applications similar to ours waiting for some real-world example, so I thought it would be nice if we shared our experience with ZGC in production.

Before going further, I need to summarize what we are dealing with. We are building the backend of GinRummy Plus in Java. GinRummy Plus is an online, turn-based, multiplayer card game. Each player tries to make pairs before the other to win the game. The game has 30k CCU on its peak hours and has players from all around to world from the US to Singapore. Clients connect to our game servers through TCP/IP to play the game. Roughly speaking, each time a player draws/throws a card the player sends a message to a server, the server gets this message, updates its status, and sends the response accordingly. The time it takes between the players move and the server’s response should be kept minimum for better user experience.

So, we thought switching from Java 8 & CMS to Java 15 & ZGC could help reduce the latency spikes when the garbage collection kicks in. We made the change and here are the results. To make the comparison I gathered a 24h window of GC logs running on each configuration.

I’ll use the graphs generated by gceasy.io to compare GC logs. This way I can present the outcome more understandably.

The first thing we checked is pause durations because this is where we expect the biggest change. Below is the table of GC pause duration ranges when using Java 8 & CMS. As you might see 73% of the GC pause durations fall between 30–40 ms, and the average of these pauses is 39.9ms.

Java 8 & CMS GC Pause durations, 24h window
Java 8 & CMS Average and Maximum Pause times, 24h window

When we switch to Java 15 & ZGC, the pause durations are less than 1ms %80 of the time, which is astonishing. Another outcome which is also significant is that no GC pause is over 2ms on this 24h window. The average of the pauses has also dropped to 1/100 with 0.377 ms. It's impressive.

Java 15 & ZGC Pause durations, 24h window
Java 15 & ZGC Average and Maximum Pause times, 24h window

We also compared the allocated memory. ZGC uses 64-bit colored pointers and does not support CompressedOOPs which reduces the memory footprint. So we expected a trade-off between low-latency and higher memory allocation. But this is not what happened. The memory footprint is also smaller in Java 15 & ZGC.

Java 8 & CMS memory allocation, 24h window
Java 15 & ZGC memory allocation, 24h window

Now it’s not only the garbage collection statistics that we are monitoring. We’re also observing the maximum number of threads that are used to process players’ messages. We are using a fixed number of worker threads to serve the players, so we need to keep these numbers small. The number of threads that are in running state is logged each second by a periodic job. The graph below is the maximum number of these threads compared along a 24h period. We can say the maximum number of running threads is reduced from 15–25 to 6–9 which is decisive.

Maximum active thread counts
Maximum active thread counts (24h/5m timeslice), Red: Java 8 & CMS, Blue: Java 15 & ZGC

Conclusion

As you can see, ZGC’s claim of milliseconds pause times hold. We achieved a %99 drop in average pause durations and milliseconds max latency. It also helped us reduce our maximum number of threads in use. Its success in our latency-critical system was certain.

I must also note that it has almost been a month now since we made the change and did not experience a crash or a memory leak, so I might say ZGC is pretty stable. Now that ZGC is production-ready, I highly recommend it if you are aiming to reach milliseconds latencies.

You might also want to check out Shenandoah as an alternative low-pause garbage collector, but I’m devoting this writing to ZGC for now.

--

--