Poor old garbage collection. One of the unsung heroes of Java, often blamed, rarely praised. Before Java made garbage collection mainstream, programmers had little choice but to track all the memory they’d allocated manually, and deallocate it once nothing was using it anymore. This is hard. Even with discipline, manual deallocation is a frequent cause of memory leaks (too-late deallocation) and crashes (too-early deallocation).
Java GC is often thought of as a necessary cost, and ‘reduce time spent in GC’ is common performance guidance. However, garbage collectors do more than memory deallocation: They also handle the allocation of memory and the arrangement of objects in memory. A good memory management algorithm can make allocation efficient by reducing fragmentation and contention. It can also boost throughput and lower response times by rearranging objects.
Why does the position of an object in memory affect application performance? A good proportion of a program’s execution time is spent stalled in hardware, waiting for memory access. Heap access is geologically slow compared to instruction processing, so modern computers use caches. When an object is fetched into a processor’s cache, its neighbours are also brought in; if they happen to be accessed next, that access will be fast. Having objects which are used at the same time near each other in memory is called object locality, and it’s a performance win.
The benefits of efficient allocation are more obvious. If the heap is fragmented, when a program tries to create an object, it will have a long search to find a chunk of free memory big enough, and allocation becomes expensive. As an experiment, you can force GC to compact frequently; it will massively increase GC overhead but, depending on the workload, application performance may actually improve.
Different JVM implementations have varying GC strategies, and even individual JVMs may offer a range of configurable options. JVM defaults are usually a good start, but it is worth understanding some of the mechanics and variations possible.
Stop-the-world collectors halt all program activity so they can collect safely. Concurrent collectors offload collection work to application threads, so there are no global pauses; instead, each thread will experience tiny delays. Concurrent collectors are less efficient than stop-the-world ones, so they’re suitable for applications where pauses would be noticed (such as music playback, or a GUI).
Collection itself is done by copying or by marking and sweeping. With mark and sweep, the heap is crawled to identify free space, and new objects get allocated into those gaps. Copying collectors divide the heap into two areas. All objects are allocated in the ‘new space’. When that space is full, its non-garbage contents are copied to the reserve space and the spaces are swapped.
Copying collectors compact the heap, which boosts allocation and object access (fewer cache misses). If most objects die quickly, copying collection will be super fast However, if objects hang around, collection will be inefficient. Copying collectors are a great fit with immutable objects, and an awful combination with object pooling ‘optimisations’ (usually a bad idea anyway).
When evaluating performance, it should be related to business value. Optimise transactions per second, mean service time, or worst-case latency. But don’t try to micro-optimise time spent in GC, because time spent in GC can actually help program speed.