Garbage Collection Is Your Friend
Poor old garbage collection. One of the unsung heroes of Java, often blamed, rarely praised. Before Java made garbage collection mainstream, programmers had little choice but to track all the memory they’d allocated manually, and deallocate it once nothing was using it anymore. This is hard. Even with discipline, manual deallocation is a frequent cause of memory leaks (if too late) and crashes (if too early).
Java GC is often thought of as a necessary cost, and ‘reduce time spent in GC’ is common performance guidance. However, modern garbage collection can be faster than malloc
/free
, and time spent in GC can speed everything up. Why? Garbage collectors do more than memory deallocation: They also handle the allocation of memory and the arrangement of objects in memory. A good memory management algorithm can make allocation efficient by reducing fragmentation and contention. It can also boost throughput and lower response times by rearranging objects.
Why does the location of an object in memory affect application performance? A high proportion of a program’s execution time is spent stalled in hardware, waiting for memory access. Heap access is geologically slow compared to instruction processing, so modern computers use caches. When an object is fetched into a processor’s cache, its neighbours are also brought in; if they happen to be accessed next, that access will be fast. Having objects which are used at the same time near each other in memory is called object locality, and it’s a performance win.
The benefits of efficient allocation are more obvious. If the heap is fragmented, when a program tries to create an object, it will have a long search to find a chunk of free memory big enough, and allocation becomes expensive. As an experiment, you can force GC to compact more; it will massively increase GC overhead, but often application performance will improve.
GC strategies vary by JVM implementation, and each JVM offers a range of configurable options. JVM defaults are usually a good start, but it is worth understanding some of the mechanics and variations possible. Throughput may be traded off against latency, and workload affects the optimum choice.
Stop-the-world collectors halt all program activity so they can collect safely. Concurrent collectors offload collection work to application threads, so there are no global pauses; instead, each thread will experience tiny delays. Although they do not have obvious pauses, concurrent collectors are less efficient than stop-the-world ones, so they’re suitable for applications where pauses would be noticed (such as music playback, or a GUI).
Collection itself is done by copying or by marking and sweeping. With mark-and-sweep, the heap is crawled to identify free space, and new objects get allocated into those gaps. Copying collectors divide the heap into two areas. Objects are allocated in the ‘new space’. When that space is full, its non-garbage contents are copied to the reserve space and the spaces are swapped. In a typical workload, most objects die young (this is known as the generational hypothesis). With short-lived objects, the copying step will be super fast (there’s nothing to copy!). However, if objects hang around, collection will be inefficient. Copying collectors are great for immutable objects, and a disaster with object pooling ‘optimizations’ (usually a bad idea anyway). As a bonus, copying collectors compact the heap, which allows near-instant object allocation and fast object access (fewer cache misses).
When evaluating performance, it should be related to business value. Optimize transactions per second, mean service time, or worst-case latency. But don’t try to micro-optimize time spent in GC, because time invested in GC can actually help program speed.