Taking out the trash

An intro into garbage collection

Garbage collection — what does this mean and why is it important?

At a basic level, garbage collection is a process that frees up memory for your application as variables or objects in memory exit the scope of what is needed for your program to run properly and therefore removed from memory so that there is space available for any potential new data.

This is accomplished differently depending on which language you are writing your program in. Higher-level languages such as Ruby and JavaScript have built-in garbage collection making this very easy. However, languages like Java and C++ require explicit commands from the developer to perform garbage collection.

The memory allocated for a program is called the Resident Set and usually it can be broken down into three main parts:

Understanding Garbage Collection and Hunting Memory Leaks in Node.js — Daniel Khan
  • The Code Segment is the memory allocated to your files where your code is written.
  • The Stack is the programs memory for all values types with pointers referencing objects on the heap and the flow of operations for your program.
  • The Heap is your programs memory for all of the variables, objects, closures, etc. that are needed at that point in the program.

In example, if you were writing in C or C++, you’d have to first declare your variable and the amount of space in memory you want to allocate to that variable, and at the point where you wouldn’t need the variable, you would explicitly tell the program to clear that space in memory (see below).

//declare variable 
char * newVariable;
//declare the memory allocation with malloc()
newVariable = (char*) malloc (42);
// Do something with newVariable
//tell the program to remove the variable from memory
free (newVariable);

This seems pretty straight forward until we think about the scale of even a small program. Its likely that you have hundreds or thousands of variables and closures. In languages like C or C++, you need to know when a certain piece of data is no longer needed in your program so that you can start freeing up space in memory, and then write a line of code that removes that data from memory! If this process never occurs a program’s memory will easily become exhausted and crash.

There are methods that can be called to return the total amount of allocated memory (the Resident Set), the memory allocated to the Heap, and the memory that is in use in the Heap. These can help visualize how garbage collection is working in memory.

Node.js memory consumption over time

So, if we are writing in JavaScript that does this for us, how is it working? I most certainly have not been writing lines of code to allocate or free memory, and my programs still work!

JavaScript has built in garbage collection, which is performed after the code is compiled into native code. Then the garbage collector then most commonly uses an algorithm known as Mark-and-Sweep. This process involves establishing ‘roots’, which are global variables with a reference kept in code. in JavaScript the ‘window’ object is a global variable which can act as a root. The root and anything that can be reached by the root are all considered present and marked ‘active’ so they are not garbage. Any piece of memory that is not marked as active will be considered garbage and freed from memory. The memory is then typically compacted to remove memory gaps in the heap and facilitate easier saving of new data.

Abby is finding all the data that isn’t being referenced by the window object and marking them as garbage!!

Any piece of data that remains in memory but is not being used (or is garbage) is known as a leak. An example of a memory leak would be global variables that do not need to be global

function foo() {
bar = "something"
}
or
function foo() {
this.bar = "something"
}

Since bar was declared without a var, let, or const, it becomes a property of the root object or the ‘window’ object, and in the second case, if we were to call foo() without binding it would have global scope and point to the window, thus creating a property called ‘bar’ equal to “something”.

We would be able to now access ‘bar’ by calling 
window.bar
#=> "something"

In this case “something” would never be collected by our garbage collector because it would always have a reference to the root or window and therefore our garbage collector will always mark it as active and never remove it from memory. If it is a small piece of data, this isn’t a big deal. But knowing this provides another important reason to always explicitly declare your variable (and use more arrow functions)!

Hopefully, garbage collection makes more sense to you now! Thanks for reading!