Understanding JVM Garbage Collection: An Introduction to Key Concepts and Techniques

Published in

CodeX

5 min readJul 1, 2024

What is a GC?

In modern applications, memory management poses an additional overhead for developers, requiring careful attention to prevent memory leaks. Automatic garbage collection (GC) provides relief by allowing developers to create object instances without worrying about manual memory deallocation. Instead, they rely on the GC to manage memory efficiently.

However, if memory is left uncollected, it accumulates over time and can exceed the application’s memory threshold, potentially causing crashes. For instance, when Brendan Eich initially developed JavaScript in just 10 days, he omitted GC at that time because early web pages were less memory-intensive and he had time crunch. However, in today’s environment, with all the pages getting complicated, not collecting garbage is out of question as it leads application to be completely unusable.

As applications grow in complexity and functionality, they become more memory-hungry. While memory capacities have increased, long-running applications and servers need to maintain stability without crashing. Since RAM is volatile and doesn’t reset itself unless powered off, effective memory management becomes crucial.

Here, the Garbage Collector (GC) comes to the rescue. Its primary function is straightforward: it collects garbage. Yet, the challenge lies in distinguishing between garbage and active data. In everyday life, we segregate garbage from non-garbage items before collection. Similarly, the GC identifies and collects unused objects, freeing up memory for further usage.

GC is not just a collector, it’s also an identifier of garbage. It employs various techniques to balance garbage marking and collection with the application’s ongoing operations. This balancing act involves trade-offs between memory usage, throughput, and latency.

Different types of applications require different GC strategies:

IoT devices prioritize memory conservation.
Data processing applications prioritize throughput over latency.
UI-intensive, Trading applications prioritize low latency to ensure responsiveness.

What are the different types of collectors? 🤔

Serial Collector
Parallel Collector
Concurrent Mark and Sweep (CMS) Collector
Garbage First (G1) Collector
Shenandoah Collector
ZGC
C4
Epsilon Collector

Some fancy key GC terminologies 🤓

Heap
Generational Hypothesis
Generations
GC Roots
GC Cycles
Segmentation
Stop the World (STW)
Barriers
TLAB (Thread-Local Allocation Buffer)

Let’s understand them together, shall we?

Heap

The heap is a section of memory allocated to an application. The JVM reserves a chunk of memory and uses it as a heap pool for dynamic memory allocation. The JVM manages this memory, providing full control and visibility for the application.

Generational Hypothesis

The Generational Hypothesis is a principle in garbage collection that suggests most objects created in the heap have a low survival rate. This means that most objects “die” (become unreachable) while they are still young, but those that do survive tend to remain alive for a much longer period.

Generation

The JVM divides memory into different generations, depending on the type of garbage collector used. The main divisions are the Young Generation and the Old Generation.

Young Generation

The Young Generation is where newly created objects are stored. Most of these objects have a low probability of survival due to the Generational Hypothesis.

Eden Space

Eden Space is a portion of the Young Generation where newly created objects reside. It is small and gets collected frequently.

Survivor Space

Survivor Space is another part of the Young Generation. Objects that survive a certain number of garbage collection cycles in Eden Space are moved here, where GC cycles are less frequent.

Old Generation

The Old Generation is where objects that have survived multiple Young Generation GC cycles reside. Objects are moved to the Old Generation based on the Generational Hypothesis. GC cycles here are less frequent but more complex due to the larger portion of the heap involved.

GC Roots

GC Roots are various references in a Java application where objects are stored outside the heap. These points are where the GC starts identifying and tracking objects. Some examples of GC Roots include:

Current active stack variables
Active threads
Static fields
JNI references

GC Cycles

GC Cycles are the phases during which the garbage collector reclaims unused memory. They occur at different frequencies and are triggered by various mechanisms. The main types are:

Major GC Cycle
Minor GC Cycle
Full GC Cycle

Fragmentation

Fragmentation is a problem in memory management where there are small chunks of free memory between allocated memory blocks. Individually, these chunks might not amount to much, but together they can be significant. This can lead to an out-of-memory problem despite having available memory, because the memory is not contiguous.

Stop the World (STW)

STW is a situation during garbage collection where the entire application is halted, and only the garbage collection process is running. This means all other threads are stopped, and the only operation happening is garbage collection.

Barriers

Barriers are pieces of code injected by the JVM before significant operations that could cause contention between threads, data corruption, or stale reads. In GC, barriers ensure that application threads can coexist with GC threads and run concurrently. Some known barriers are:

Read Barrier
Write Barrier
Exotic Barrier

TLAB

TLAB, or Thread-Local Allocation Buffer, is a block of memory in the heap (mainly in Eden Space) allocated by the JVM to each thread. This memory is local to the thread and is periodically synced into the common heap pool. The main purpose of TLAB is to avoid memory contention and reduce synchronization overhead between threads by providing each thread with its own memory. When a TLAB is full, the thread requests a new TLAB from the JVM. If there is enough space in the Eden region, the JVM assigns a new TLAB; otherwise, it performs GC to free up space before assigning a new TLAB.

Stay tuned for future articles where we will delve deeper into each of the collector and understanding how they work and how are they evolving with modern requirements. We will also explore GC used in other languages like python and javascript. Cheers ✌️

Part 2: https://medium.com/@Sasuke_214/understanding-jvm-garbage-collection-serial-and-parallel-collectors-2ee859f7b53f

Sources for verifying and refining my article: