Garbage Collection — What Is It and Why Don’t We Have to Worry About It?

Mary Farner
The Startup
Published in
6 min readNov 24, 2019

Memory is something most beginner programmers take for granted. More focused on logic, syntax, and just getting it to work, we usually don’t pay attention to how we make space for our programs and their content, how much space we take up with our program, or if/how we ever free up that space we created once our program stops needing it.

In most modern languages such as Ruby, Java, Python, and Javascript, we have the luxury of never really having to worry about this; the languages take care of it all under the hood. But there are some lower level programming languages, such as C, in which we do have to keep track of our own memory. In either case, whether you are manually managing memory or it’s being done for you, at some point in your programming career it is important to understand how to manage memory. This process is called garbage collection.

The Stack and the Heap

First, lets just talk about computer memory in general. Regardless of the language, when a program runs, memory is “allocated” on our RAM to make space for the variables, computations, etc. that we need throughout our program. This space can be in one of two places, that stack or the heap. In short, the stack holds temporary, local variables and its memory is automatically managed, whereas the heap holds larger, dynamic data objects that are managed manually.

The Stack

Items on the stack are managed by the CPU upon compilation, such that every time a variable is created, it’s pushed onto the stack. Things are the stack are ordered just like the name implies — in LIFO (Last-In-First-Out), so when the program ends and the CPU automatically clears the stack, it removes the last item that was added first. Think of it like a stack of folded shirts: whichever shirt you fold and add to the pile last is the first one you can grab to wear. In the case of the stack, the most recently stored item is the first to be “freed.” The stack is limited in size, and for the most part holds local variables of primitive data types. Thing ints, booleans, nums, etc.

The Heap

The heap is not managed by the CPU, and as the programmer you are responsible for both allocating and deallocating memory for your data. This is the obvious downside to the heap, because you have to be aware of where, why, and how much memory you are working with at all times. Unlike the stack which is just that — a stack — memory has to be distinctly referenced when it’s being created, resized, or freed. The heap is also slower to access than the stack because variables are stored using pointers. That said, the heap doesn’t have a limit to size of variables, and it has more space than the stack.

Manual Garbage Collection

In older programming languages, data on the heap had to be managed manually by the programming. When you create a variable, you have to specify the amount of space you need and allocate it. If you change your variable, you’d have to reallocate space for it. And when you’re done with a variable, you have to free that space up by destroying it in memory. If you happen to create memory for something and then forget to free it, you end up with what is called a memory leak.

In C, we use a handful of methods to dynamically manage memory on the heap: malloc(), calloc(), realloc(), and free(). When we say dynamic memory management, that really just means that we create and destroy memory, and that we can change the size of a piece of data at runtime if it grows (i.e. an array that grows from 4 elements to 6 elements). You can check out this site for more info on these methods, but the gist of it is that we have to create a specific amount of space in memory using either malloc() or calloc(), we can change the amount of space allocated using realloc(), and when we’re done we need to free the memory using free(). It’s a lot to keep track of, and if anything goes awry, we end up with memory leaks — that is, the data is no longer being used by the program, but the space in memory is locked up and unavailable to other processes. No good!

Garbage Collection in Javascript

Programming vets are all pretty comfortable with these concepts of manual memory management, but it sounds pretty daunting if you’ve never dealt with it. Thankfully, most modern languages have their own automatic garbage collection methods. I want to focus on garbage collection in Javascript. First and foremost, it’s done “under the hood,” so none of it requires any intervention from the programmer. Just like any language, JS stores some things on the stack and others on the heap as discussed above. The only thing you’ll really need to worry about regarding memory in JS is if you run out of space on the stack, in which case JS will throw a nice little error. That said, it’s good practice to know what is actually happening so that if you do run into memory issues at some point you know why, and how to troubleshoot.

Reachable Objects

In short, the Javascript garbage collector will rid of (and free up the associated memory) of any object that is not “reachable.” The concept of being reachable is just what it sounds like — it’s really just anything that the program currently has access to and could possible need to use. The following list of items are all, understandably, always reachable. These are called “roots”:

  1. Local variables of the current function
  2. Global variables
  3. Variables and functions located on the current nested call chain (i.e. anything part of a function that called the current function)

In addition, anything referenced by a root (i.e. an object which is referenced by a property of local object) is also reachable.

“Mark And Sweep”

The algorithm for garbage collection in JS is also pretty simple, and is exactly what the name sounds like, first it marks reachable variables, then it “sweeps” away any unreachable variables. Specifically, first the garbage collector marks all roots as reachable. Next, it visits objects referenced by roots and marks them as reachable. Then, it visits all marked objects and marks their references, etc. until there are no more reachable objects. Then, it deletes anything in memory that’s not marked. Viola.

There are a few optimizations that JS might run in order to make the process more efficient. General Collection keeps track of objects that have been around for awhile and doesn’t bother checking them as often, because it can assume that they’re still needed. Incremental Collection breaks up the marking process into pieces if there are tons of objects, because marking all of them at once could take a lot of time. Idle-time Collection attempts to only run garbage collection when the CPU is idle, or not working as hard on the acutal program, in order not to affect execution.

This site gives a really great overview of garbage collection in JS, with some really great visual examples and rationale for these optimizations. Feel free to check it out if you’re interested. Otherwise, just rest easy at night knowing that it’s all done for you, cleanly and automatically.

Resources

--

--