Writing your own Garbage Collector for JDK12

Unmesh Joshi
May 6 · 6 min read

The GC Interface

As part of JEP 304(https://openjdk.java.net/jeps/304), JDK’s garbage collector code was refactored to create a standard interface for all the GC algorithms and to isolate every GC algorithm in its own set of source files.

Another related JEP 318(https://openjdk.java.net/jeps/318) introduced a very simple no-op garbage colector, called Epsilon. Because of these two changes, the garbage collector code is modular and isolated now, and Epsilon garbage collector code is easy enough to understand, making it possible to experiment with workings of memory allocation and garbage collectors in JDK.

Some time back Aleksey Shipilëv wrote an excellent blog (https://shipilev.net/jvm/diy-gc/) with code samples to explain how we can introduce a garbage collector in Epsilon. I tried using Epsilon and the Garbage Collector code from that post, refactoring it a bit to write a very simple allocator from scratch and adding it to JDK. It is a fun exercise and helped greatly in understanding garbage collection code in JDK.

What is No-Op GC?

Epsilon is called as an No-Op GC. To understand what No-Op means, we need to look at garbage collection interface. (It should really be called as memory management interface). It has two parts, a set of methods for allocating object instances on heap and a set of methods to invoke garbage collection. Epsilon GC implements all the allocation methods, but does not implement anything for collecting garbage objects.

CollectedHeap — The Garbage Collection Interface

Essentially, if we need to implement a garbage collector for JDK, it needs to be subclass of CollectedHeap class, and implement some of its key methods. They key methods to implement are as following

1. virtual jint initialize() = 0;

This is where all of the heap initialization is expected to be done. It involves, actually reserving memory, (which is a call to anonymous mmap system call on Linux). There is a helper method for reserving memory which can be used to allocate heap memory

ReservedSpace heap_rs = Universe::reserve_heap(heap_size_in_bytes, align);

Other kind of initialization, like initializing bitmap memory used in the marking phase of garbage collection, or other configurations happen here.

2. HeapWord* CollectedHeap::allocate_new_tlab(size_t min_size,size_t requested_size, size_t* actual_size)

Most garbage collectors support thread local allocation buffer, where each thread can have its own portion of memory buffer from which to allocate objects.

3. virtual HeapWord* mem_allocate(size_t size, bool* gc_overhead_limit_was_exceeded) = 0;

CollectedHeap has three different allocation methods for objects, arrays and classes.

CollectedHeap::obj_allocate, CollectedHeap::array_allocateCollectedHeap::class_allocate

These three methods allocate memory equal to the size of the object or array to be allocated. We will see how object size is calculated later. All these methods internally call to mem_allocate if the allocation is happening outside TLAB.

ContiguousSpace class

ContinuousSpace is a shared class provided by JDK, which can be used by the GC implementation to allocate objects. Epsilon and all other GCs use this class or override this class to allocate memory. The default implementation has both lock based and lock free versions of memory allocation. Epsilon uses lock free version. The same is used in the sample code in this writeup.

HeapWord* ContiguousSpace::par_allocate(size_t size)

Alignment and Its Importance

When allocating heap space, a lot of attention needs to be given to memory alignment. On linux, page size is 4096 bytes. So the heap size needs to be aligned on 4KB boundary. There are helper methods to make sure we allocate heap on page size boundary.

size_t align = size_t align = HeapAlignment;size_t init_byte_size = align_up(InitialHeapSize, align);size_t max_byte_size = align_up(MaxHeapSize, align);

(The same is true when allocating objects. After the object size is calculated, it’s made sure that the size aligned on the right boundary and if not, necessary padding bytes are added as required.)

This is primarily what needs to be implemented for allocation part of the garbage collection interface.

How is Object size calculated?

One of the interesting things to understand is that the allocation interface takes the size in heap words to be allocated. It’s interesting to see how object size is calculated. When the class file is loaded, it’s parsed to know the structure of the class. The following method in ClassFileParser computes the size of the object for given class and also populates field offsets in the instance from the base address of the object.

void ClassFileParser::layout_fields(ConstantPool* cp, 
const FieldAllocationCount* fac,
const ClassAnnotationCollector* parsed_annotations,
FieldLayoutInfo* info,

Each object always has a header part, which has a pointer to the class object and a generic word. The header looks as following..

Each object has this header as part of it. On 64 bit JVM, a pointer is 8 bytes long. So header, which has two pointers, is 16 bytes. (There is an optimization called compressedOop, where all the pointers, including the Klass pointer in the header is stored as an int (4 bytes), instead of char* (8 bytes), saving 4 bytes for every pointer in the object layout. When compressedOop is enabled, which is by default, the header size is 12 bytes)

Object size is always represented as number of words on the given architecture. wordSize is 8 bytes. So if a given object is 40 bytes in size, it’s size is equivalent to 5 words.

To take a simple example, if we have a class like the following.

Its 5 object references. So the size will be 16 bytes (header) + 40 bytes (5 reference fields * 8 bytes) = 56 bytes (7 words)

If CompressedOOP is enabled it will 12 bytes (header) + 20 (5 reference fields * 4 bytes) = 32 bytes (4 words)

Thread Local Allocation Buffers and Bump the pointer allocation.

For object allocation, each thread is given a local buffer (by default 64Kb), which is used by threads to allocate objects. The advantage is that allocation does not need any locking and can just be implemented by bumping the base pointer by size of an object.

Allocating an object of size 40 bytes (5 words), is just about bumping the top of the heap pointer by 5. The allocation is as simple as following code..

GCArguments -The factory class for creating JVM heap

How to hook in the custom heap class we create to the JVM machinery? We need to create a factory class for our heap. A subclass of GCArguments, something like following

Once we have this class, we can then add our own flag to gcConfig.cpp to hook in our arguments class

Define, SIMPLEGC_ONLY in macros.cpp like following

These changes hook in our GC implementation class to JVM, and we can then run JVM by passing argument -XX:+UseSimpleGC

There is a patch, which has all the changes required


How to implement Garbage Collection

When the heap is full and there is no more space left to allocate new objects, we need to make space by deallocating dead objects. Deallocation can be done just by resetting the top of the heap.

The best explanation of how to implement garbage collection algorithm, can be found at https://shipilev.net/jvm/diy-gc

The algorithm used there is mark-compact garbage collection. It’s important for two reasons. It shows how live objects are marked and more importantly how objects are moved and pointers adjusted, which is a key part of any garbage collector which involves moving objects (Generational Garbage Collectors e.g). I will just summarise key steps and utility methods that JDK makes us available for implementing each of these. All garbage collection algorithms will mostly end up doing following three steps, using the same methods as explained below.

Mark all live objects.

For this we need to find all the object references which are live and then mark them. All subsystems in the JVM which can have object pointers accept a closure and pass object pointers which we can deal with.

class ScanOopClosure: public BasicOopIterateClosure {
void do_oop_work(T* p) {
T o = RawAccess<>::oop_load(p);

We can chose to mark object themselves or have a separate structure like bitmap which makes it very efficient. JDK has utility class called MarkBitmap which allows us to mark the objects
Its as simple as following code

if (!_bitmap->is_marked(obj)) {
_bitmap->mark((HeapWord*) obj);

In this step, we also need to iterate through root objects, the oop header class provides a utility method for doing that


We can pass the same closure it.

Move live objects

Move all live objects to start of the heap and reset top. All the remaining space now becomes free, and there are no holes in the heap.Its done by literally copying object from one location to other. The utility method JVM code provides is as following

Copy::aligned_conjoint_words((HeapWord*) obj, 
(HeapWord*) fwd,

Adjust pointers in all live objects to point to new object locations.

As we saw in the OopClosure, what we get is a pointer to oop pointer. So once we move the objects, we can store those new pointers into places where the object pointers are stored by writing a similar OopClosure, and using following utility method.

RawAccess<>::oop_store(p, fwd);

Overall, implementing your own garbage collector and hooking it into JDK code is a fun exercise, it helps understanding some of the core concepts of garbage collection algorithms at the implementation level.

Thanks to Aleksey Shipilëv for writing a great blog with garbage collector code to explain things.

The code with mercurial patch for all the changes needed is available at https://github.com/unmeshjoshi/simgplegc

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade