Garbage Collection: How it’s done

If you are familiar with the basics of the memory allocation in programming languages, you know that there are two parts in the memory defined as Heap and Stack.

The stack memory is used for execution of a thread. When a function is called, a block of memory is allocated in the stack to store the local variables of the function. The allocated memory gets freed when the function is returned. In contrast to the Stack, Heap memory is used for dynamic allocation (usually when creating objects with “new” or “malloc” keyword) and memory deallocation needs to be handled separately.

Object myObject = new Object();

.............


myObject = null;

If at some point of the program, another reference to an object or “null” was assigned to the “myObject” variable, the reference that existed with the already created “Object” will be removed. However, the memory allocated for this “Object” will not be freed even though the Object is not being used. In the older programs such as C or C++, the programmer needs to be concerned about these type of objects allocated in the Heap and delete when they are not in use to free up the memory. Failing to do that can end up in a memory leak. In the other hand, if we mistakenly delete an Object that has a live reference to a variable can cause null pointer exceptions in later parts of the code when we try to access the deleted object using the old reference.

However, in languages like Java and C#, this memory management is handled by a separate entity known as the Garbage Collector.

With a Garbage Collector in place, we can allocate an object in the memory, use it and when there is no longer any reference for that object, the object will be marked for the Garbage Collector to pick up freeing the allocated memory. And a Garbage Collector also guarantees that any live Object that has an active reference will not get removed from the memory.

Reference Counted Garbage Collection

The reference count garbage collection keeps a track of the number of references for a particular object in the memory. Let’s look at the following code segment.

Object a = new Object(); // Reference Count(OB1) starts at 1
Object b = a; // Reference Count (OB1) incremented to 2 as a new reference is added
Object c = new Object();

b.someMethod();
b = null; // Reference Count(OB1) decremented to 1 as reference goes away
a = c; // Reference Count(OB1) decremented to 0

When executing the line Object a = new Object(), a new object (let’s say OB1 ) is created in the memory and the reference count (for OB1) starts at 1.

When the reference for the OB1 in the variable “a” is copied to “b”, the reference counter increases by one as now two variables have the reference for OB1.

When “b” is assigned to null, there reference for OB1 decreases leaving only the variable “a” having a reference for OB1.

When the value of “a” is updated by the value of “c” (which is having a reference for a whole new object), there reference counter for the OB1 becomes zero leaving the OB1 available for garbage collection.

Drawbacks in Reference Counted GC

The main disadvantage in the reference counted garbage collection is its inability to identify circular references. To understand the circular references, let’s have a look into the below code segment.

Consider two classes A and B having each other’s references.

class A {
private B b;

public void setB(B b) {
this.b = b;
}
}

class B {
private A a;

public void setA(A a) {
this.a = a;
}
}

Now in the main method, we can create new objects for both of these classes and assign the references.

public class Main {
public static void main(String[] args) {
A one = new A();
B two = new B();

// Make the objects refer to each other (creates a circular reference)
one.setB(two);
two.setA(one);

// Throw away the references from the main method; the two objects are
// still referring to each other
one = null;
two = null;
}
}

When we assign null values for the two variables one and two, the external references existed with the class objects (“A” and “B”) created at the beginning will be removed. Still, they won’t be eligible for garbage collection as the reference counters of those two objects will not become zero due to object “A” having its reference inside “B” and the object “B” having its reference inside “A”.

Mark and Sweep Garbage Collection

As the name suggests, Mark and Sweep garbage collectors have two phases
1. Mark Phase
2. Sweep Phase

Mark Phase
During the Mark phase, the GC identifies the objects that are still in use and set their “mark bit” to true. The search starts with a root set of references kept in local variables in the stack or global variables. Starting from the root references the GC will conduct a depth search for the objects that have reachable references from the root. Any object that keeps a reference of another object, keeps that object alive.

It is important to keep note that during the Mark phase, the application threads are stopped to avoid the changes that can happen to the object state during the marking phase.

Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description

The cyclic references are not an issue for a Mark and Sweep GC. If you observe the above diagram, a cyclic reference exists (shown by the square) but it is unreachable from the root. Hence, those types of references will not be marked as live allowing GC to collect as garbage.

Sweep Phase
In the sweep face, all the unmarked objects from the Mark phase will be removed from the memory freeing up space.

Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description

As you can observe from the above diagram, there may exist plenty of free regions after the sweep phase. But, due to this fragmentation, the next memory allocation may fail if it is bigger than all the existing free regions.

To overcome this problem, an additional Compact phase was added.

Mark-Sweep-Compact Garbage Collection

After the sweep phase, all the memory locations are rearranged to provide a more compact memory allocation. The downside of this approach is an increased GC pause duration as it needs to copy all objects to a new place and to update all references to such objects.

Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description

Mark and Copy Garbage Collector

This is similar to the Mark and Sweep collector, but the memory space is divided into two. Initially, the objects are allocated to one space (fromspace), and the live objects are marked.

Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description

During the copy phase, the marked objects are copied into the other space (tospace) and at the same time compacted. Then, the fromspace is cleared out.

Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description

After that, both spaces are swapped resulting any new memory allocation to allocate memory in the “new fromspace” (the old tospace will now become the new fromspace). Finally, when the “new fromspace” becomes full, the whole process happens again.

Generational Garbage Collector

In generational garbage collection, the memory space is divided into different generations (e.g. young generation and old generation). Initially, all the objects would reside on the young generation. However, when a garbage collection cycle happens, objects that survive the garbage collection will be promoted to the older generation.

Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description
Source: https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/description

Now, the objects left in the young generation can be cleared as all the live objects are moved to the old generation.

Garbage collection cycles in the old generation occur less frequent than in the young generation. The key idea behind this approach is that objects that survive the first garbage collection tend to live longer. Thus the frequency of garbage collection can be reduced for objects in the older generations. The number of generations differs with the programming language. For example, in Java there are two generations and in .NET there are three.

It’s always up to the programmer to decide the optimum garbage collector to be used for the application. We can research the types of garbage collectors that have been implemented with the programming language that we use and their properties. However, choosing a garbage collector might require thorough testing as garbage collection is something that affects the performance of an application.

References

[1] https://app.pluralsight.com/library/courses/understanding-java-vm-memory-management/table-of-contents

[2] https://www.geeksforgeeks.org/mark-and-sweep-garbage-collection-algorithm/

[3] https://blogs.msdn.microsoft.com/abhinaba/2009/01/30/back-to-basics-mark-and-sweep-garbage-collection/

[4] https://plumbr.io/handbook/garbage-collection-algorithms