Explaining garbage collection algorithms by keeping your fridge tidy

Learn some commonly-used GC algorithms, work to manage your leftovers.

In every place I’ve lived, I’ve had the same main responsibility: cleaning out the fridge. This life calling began around middle school and my skills developed in parallel with my ability to assemble a meal consisting only of leftover takeout and the last dregs from condiment jars. By the time I’d made it to college and begun advancing my study of computer science, this ability to conjure sustenance from unlikely combinations made the multi-day marathon stays in the laboratory sustainable (if less tasty). Plus, my living spaces were always pleasantly free of the refrigerator science experiment odors which plagued neighboring dorm kitchens, something which my roommates appreciated. Sure, the process of cleaning things out often got contentious — as mocked by George Carlin’s now famous skit, The Ice Box Man — but, on the macro level, it was mostly me just eating leftovers.

What is garbage collection?

Unexplored by many who consider automatic memory management the purview of compiler implementors and language designers, garbage collection is the process by which a program’s unused memory gets automatically reclaimed. This can happen via either the environment (like the JVM) or by the language itself (like in Lisp). In contrast to programs written in languages like C and C++ — whose source code can often resemble a battleground of malloc, free, new and delete calls — garbage collected languages free programmers from such mundane chores as the allocation and reclamation of memory segments.

Image for post
Image for post

Kitchen as computer, refrigerator as memory

Following in computer science’s longstanding tradition of imperfect metaphors (a monad is like a burrito!), let us introduce the idea that, at a high enough level, both kitchens and computer share many operational semantics. A computer has an operating system kernel while a kitchen has a chef. A computer possesses hardware components (CPU, memory, hard disk, network interface) for computational tasks while a kitchen boasts a variety of appliances (stove, oven, freezer, blender) used for food preparation. As you might guess, we will draw a parallel between a cook’s usage of a refrigerator for food storage and a program’s usage of memory regions for storing data.

Tracing garbage collection

Tracing garbage collection strategies underly such a broad category of algorithms that many refer to tracing and garbage collection interchangeably. In it’s purest form, tracing garbage collection refers to contiguous blocks of memory as objects and involves very few steps:

  1. For any object in memory visited by the traversal, the collector deems it to be reachable and therefore live.
  2. Any object in memory not visited by the traversal must not have any references pointing to it so the collector considers it unreachable and dead.
  3. The collector condemns dead objects and their space become eligible for reclamation.
Image for post
Image for post

Mark and sweep

In 1960, John McCarthy, the father of Lisp and progenitor of garbage collection, broke up collection cycles into two parts: the mark phase and the sweep phase. The mark phase consists solely of using tracing to determine object reachability, leaving behind an actual mark (often just a toggled bit) on every visited (live) object. Then, during the sweep phase, every unmarked (dead) object gets reclaimed. Since traversing reference trees and toggling bits during the mark phase incurs significantly less cost than actually freeing memory, a collector could opt to delay sweeping until convenient.

Reference counting

A reference counting garbage collector keeps a dedicated count of references to an object, incrementing and decrementing the count as references are established and forgotten. When the counter reaches zero, the collector declares the object dead (condemned) and ripe for reclamation. This requires the added step of managing the reference count as part of initializing or destroying a reference which can cause problems. For example, if a given object gets passed around with great frequency, this could result in performance overhead as the counter changes often. In a multi-threaded program, multiple threads continually trying to modify the same counter could require counter synchronization, increasing program complexity and introducing the possibility of lock contention. Not to mention the problems of needing to maintain counters somewhere (itself using more memory) and the possibility of needing to increment counters beyond the size allocated for them.

Generational garbage collectors

Another type of garbage collection algorithm, called generational collection, relies on a principle called the generational hypothesis. This hypothesis relies on the idea that the longer an object stays alive, the more likely it is to survive going forward. Consider the case of buffers used to store data during processing. Such objects get used and then discarded in relatively short order whereas longer-lived objects, like database connections, might stay around for the entire duration of a program’s execution! Taking advantage of this principle, a generational collector could reduce the overhead during scanning phases (whether tracing or something fancier) by focusing more on newer objects.

  1. Reclaim all dead objects.
  2. All objects which have been around for awhile get promoted to the Old generation.
  3. For the next few cycles, the collector ignores the Old generation and focuses on the Young generation.
  4. Every n cycles or so, perform a complete scan.
Image for post
Image for post

Why stop the world?

If you stuck with our chosen metaphor, you may have noticed that all scenarios discussed so far require physically standing in front of the fridge, thus making it impractical or inconvenient for others to access it at the same time. This very problem has led computer scientists to classify such blocking garbage collection strategies as stop the world algorithms since they require the main program itself to halt while garbage collection takes place. Often, the garbage collection process will consume more CPU than the regular execution itself, leading to undesirable resource hogging by the program. The specifics differ from implementation to implementation but, since reference counting allows for immediate reclamation if so desired, many OS kernels use reference counting for internal bookkeeping. Some modern garbage collectors, like the JVM’s new G1 collector, rely on a strategy called partitioning wherein the collector will suspend access to a limited portion of memory which could prevent the program itself from halting entirely.

Conclusion

I may never know why I was born with the gift of cooler cleanliness, but I will likely spend the rest of my life in fate’s chosen household role. It may further remain unclear why more people don’t have an appreciation or interest in learning about garbage collection algorithms. Afterall, it’s as easy as cleaning out your fridge.

Image for post

Learning more

Aside from the obligatory link to the Memory management article at The Wikipedia, if this topic interests you, I highly recommend the resources at MemoryManagement.org which provides great material regarding the differences between types of memory management as well as an exceptional glossary. For the die hards, there’s always The Garbage Collection Book by Jones, Hosking and Moss, which I fully intend to read as soon as they release an e-reader edition. Moreover, from the oldie-but-goodie department, the c2 wiki has some exceptional material on garbage collection, complete with comments from veteran developers on the low-level details. Finally, Paul R. Wilson from the University of Texas has written a very accessible paper [PDF] providing an introduction to various GC topics.

Computer science makes me so very happy. Let's cook something, sip some scotch and watch Star Trek!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store