JVM Garbage Collection Basics

Prasad Jayakumar
Javarevisited
Published in
7 min readNov 30, 2020
Photo by Arteum.ro on Unsplash

Learning about JVM Garbage Collection and the mechanics behind each of the Collector Algorithms is as important as learning Java language features. High-level details of the algorithms give a perspective about the lifecycle of objects, the probable failures (like an out-of-memory exception, memory leaks etc.,) and how the failures can be averted.

Garbage Collection

Garbage Collection (GC) is a form of automatic memory management. The basic operations of any Garbage Collector are

  • Object memory allocation
  • Find objects that are in use starting from GC roots — Mark
  • Free objects that are not in use — Sweep
  • Compact the memory to prevent memory fragmentation — Compact

Heap Memory Model

Let’s recollect the classics and then move onto the shiny and latest models

Conceptual View of JVM Heaps

Permanent generation has been completely removed in JDK 8. This work has been done under the bug. Options PermSize and MaxPermSize have also been removed in JDK 8 — Source: Blog by Poonam Parhar

🙇Source: “Java Performance: In-Depth” by Scott Oaks — Second Edition

Young Generation

Eden

  • Objects are allocated in eden (which encompasses the vast majority of the young generation).
  • When the young generation is cleared during a collection, all objects in Eden are either moved or discarded: objects that are not in use can be discarded, and objects in use are moved either to one of the survivor spaces or to the old generation.
  • Since all surviving objects are moved, the young generation is automatically compacted when it is collected: at the end of the collection, Eden and one of the survivor spaces are empty, and the objects that remain in the young generation are compacted within the other survivor space.

Survivor Spaces

  • Survivor spaces are designed to allow objects (particularly just-allocated objects) to remain in the young generation for a few GC cycles. This increases the probability the object will be freed before it is promoted to the old generation.
  • If the survivor spaces are too small, objects will get promoted directly into the old generation, which in turn causes more old GC cycles.
  • The best way to handle that situation is to increase the size of the heap (or at least the young generation) and allow the JVM to handle the survivor spaces.
  • In rare cases, adjusting the tenuring threshold or survivor space sizes can prevent promotion of objects into the old generation.

Old (or Tenured) Generation

  • As objects are moved to the old generation, eventually it too will fill up, and the JVM will need to find any objects within the old generation that are no longer in use and discard them.

Metaspace

  • When the JVM loads classes, it must keep track of certain metadata about those classes. This occupies a separate heap space called the metaspace.
  • Information in the metaspace is used only by the compiler and JVM runtime, and the data it holds is referred to as class metadata.

💡Metaspace does not hold the actual instance of the class (the Class objects), or reflection objects (e.g., Method objects); those are held in the regular heap.

Garbage Collection

GC Inhibitors

Baristas in a busy coffee shop cannot complaint about huge customer lines and never ending services. Rather they would be “happy” about the business.

On similar lines, Garbage Collection exists, because the application does something “important” and GC wanted to make sure that the application life continues. The inhibitors listed below are those who could slow down GC in doing their job.

  • Application Logic (Mutator Threads): When GC threads track object references or move objects around in memory, they must make sure that application threads are not using those objects. This is particularly true when GC moves objects around: the memory location of the object changes during that operation, and hence no application threads can be accessing the object.
  • Allocation Rate: The rate at which new objects are getting created which needs more memory allocation
  • Live Dataset Size: The likelihood of objects staying alive for little longer leads to an increase in consumption of the overall heap size.
  • Allocating Large Objects: Large objects can cause the need for compaction frequently in the case of a constrained heap.

Garbage Collectors Algorithms

Goals

Garbage Collectors typically have the following goals (easier said than done)

  • Very short “stop the world pauses” with a target of a few milliseconds
  • Pause times do not increase with a heap, live-set, or root-set size
  • To handle heap sizes ranging from few MBs up to many TBs
  • High concurrency — All heavy lifting work is done while Java threads continue to execute
  • High Throughput
  • Easy to tune

Garbage Collectors

Data in heap is partitioned into multiple allocation regions (or generations) which are kept separate based on the object age (i.e. the number of survived GC iterations). While some collectors are uni-generational, the others use two generations: (1) the Young Generation (further split in Eden and two Survivor regions) and (2) the Old (or Tenured) Generation.
Source: Blog by Ionut Balosin

Two generational collectors:

Serial GC — The algorithm uses a single thread to perform all garbage collection work, which makes it relatively efficient because there is no communication overhead between threads. It’s best-suited to single processor machines because it can’t take advantage of multiprocessor hardware.

Throughput (Parallel) GC — This algorithm uses mark-copy in the Young Generation and mark-sweep-compact in the Old Generation. Both Young and Old collections trigger stop-the-world events, stopping all application threads to perform garbage collection. Both collectors run marking and copying/compacting phases using multiple threads, hence the name ‘Parallel’.

Parallel Garbage Collector is suitable for multi-core machines in cases where your primary goal is to increase throughput. Higher throughput is achieved due to the more efficient usage of system resources:

  • during collection, all cores are cleaning the garbage in parallel, resulting in shorter pauses
  • between garbage collection cycles neither of the collectors is consuming any resources

Garbage First (G1) GC — This collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability, while achieving high throughput.

  • The G1 collector takes a different approach in terms of heap memory model. The heap is partitioned into a set of equal-sized heap regions, each a contiguous range of virtual memory.
  • Certain region sets are assigned the same roles (eden, survivor, old) as in the older collectors, but there is not a fixed size for them.
  • Region size is chosen by the JVM at startup. The JVM generally targets around 2000 regions varying in size from 1 to 32Mb. The G1 collector takes a different approach

For more details, refer to Getting Started with the G1 GC

G1 Heap Allocation

Uni generational collectors:

Shenandoah GC — Shenandoah is the low pause time garbage collector that reduces GC pause times by performing more garbage collection work concurrently with the running Java program. Shenandoah does the bulk of GC work concurrently, including the concurrent compaction, which means its pause times are no longer directly proportional to the size of the heap. Garbage collecting a 200 GB heap or a 2 GB heap should have a similar low pause behavior.

For more details, please refer to Implementation Details in Wiki

Z GC — ZGC is a concurrent, single-generation, region-based, NUMA-aware, compacting collector. stop-the-world phases are limited to root scanning, so GC pause times do not increase with the size of the heap or the live set.

For more details, please refer to Session on ZGC by Per Liden on YouTube

Finally,

Epsilon GC (experimental)— A GC that handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM will shut down. Designed for internal JDK testing but can conceivably be useful in two situations

  • Very short-lived programs
  • Programs are carefully written to reuse memory and never perform new allocations
Garbage Collection Algorithms

For all practical purposes, I will share details related to only the following three Java Versions

  • Java SE 8 — LTS
  • Java SE 11 — LTS
  • Java SE 15

There are two possible set of Java users (my guess)

  1. Wilderness explorer who would want to explore and use all possible new features and keep up to the latest updates of Java (be it development or production).
  2. Enterprise users who would want the stability and reliability of the platform. They normally prefer LTS versions.

I wish, there was a Babel for Java 😄 which would have bridge both the worlds. Of course — we could use Kotlin (blog for another day).

GC algorithm status against JDK versions

In my next blog, I will detail the GC algorithms along with hands-on around GC tools and setting JVM options for performance and health.

In case you liked my writing style, please share/retweet the blog to your fellow developers 😆

References

--

--