Garbage Collector 101

This is a first topic after I start working for Lazada. Such a busy time for writing something useful :) In this topic, we will discuss about garbage collector in Java, from the very early day we must manually manage memory by hand.

C/C++ Style

Introduction

Firstly, I will discuss how objects are managed under C/C++ environment. In C, we create an object by using keyword malloc and C++ using keyword new.

char* str = (char *) malloc(15);     // C: allocate an array of char
Student* studentPtr = new Student(); // C++: allocate a student obj

When we finish using that object, we must manually free memory. If we don’t do that, heap will be polluted and don’t have enough memory for future allocating. In C we use free and C++ we use delete.

free(str)
delete studentPtr

Here is an illustration for above concept:

illustration for allocating memory
  1. Create new object: an object will be allocated on heap with a reference points to.
  2. Assign variable: such as a = b. Now there are two pointers point to same heap memory. One changes will also affect other.

Problem

Manage memory by hand easily causes runtime exception. (Exception that only happens when running program). There are two exceptions programmer often meets in manage memory: segmentation fault and memory leak.

Segmentation fault: When programmer has freed memory for heap object at some point and this memory is accessed by other pointer in future, programmer might meet exception SEGMENTATION_FAULT. In fact, this behaviour is unidentified because this memory might or might not be deallocated / allocated by OS at that current time.

illustration for segmentation fault
  1. Pointer A frees heap memory it points to.
  2. OS deallocates this memory and allocates somewhere else.
  3. Reading this memory later caused segmentation fault.

Memory leak: When there are no references point to heap object but programmer forgets to free memory, that memory is kept on heap forever without using. Memory leak occurs. Many memory leaks can lead to application crash, slow down program …

illustration for memory leak
  1. Pointer A doesn’t point to heap object any more.
  2. Similarly for pointer B.
  3. No pointer points to heap object. But memory for him still be there ;( LEAK

Automatic Reference Counting

Introduction

Fixing above problems, Objective-C and Swift uses Automatic Reference Counting (ARC) mechanism for managing memory. This is a significant improvement compared to old day.

Each time programer create, ARC works by: when there is a variable assign, it will increase counting to one. ARC will not deallocate an instance as long as at least one active reference to that instance still exists. when deassign, it will decrease to one. when a heap object is zero, system will automatically deallocate that memory for different use.

To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.

To make this possible, whenever you assign a class instance to a property, constant, or variable, that property, constant, or variable makes a strong reference to the instance. The reference is called a “strong“ reference because it keeps a firm hold on that instance, and does not allow it to be deallocated for as long as that strong reference remains.


Here is an image for demo:

Problem

However, ARC meets another problem named Reference Circular problem as describe in image below.

Demo for circular reference problem:


Garbage Collector

Introduction

Avoiding reference circular.

Compare with ARC

Reference counting can be overall slower than GC. Apple considered this not to be a problem.

- Reference counting does not handle reference cycles, GC has no problem with that. Apple alleviates the problem with weak references, and it’s up to the developer to use them correctly.

- GC requires more memory than reference counting (up to twice, in some cases). Apple thinks smartphones have too little RAM to waste it for GC (that’s also why Android phones have more memory: they use 1.8~2.0x as much as what they would use if they did not have GC).

- GC may be faster, but when it triggers it prevents your program from running for potentially several milliseconds. This time may not be very short, after all, if you are doing something critical. Reference counting may be slightly slower but execution is smooth. Apple thinks smooth execution is more important than raw speed with hiccups.

So Apple chose reference counting.

Garbage Collector Algorithm

We have gone through how Garbage Collector works. Above concept just stands under concept. For implementing this concept, there are many algorithms for JVM. (Not in Android). Until now, in normal JVM, there are four algorithms:

  1. The serial Garbage Collector: using a single thread to process the heap. When processing heap, both minor or full cycle, Garbage Collector will stop all application threads until finish. So this collector is recommended for client application with small heap size which doesn’t have low-pause time requirement.

Serial collector uses only one thread for garbage collecting so it doesn’t use all power of multi-thread environment. Serial collector can be set by XX:+UseSerialGC flag.

Illustration for serial collector

2. The throughput Collector: Unlike serial collector, the throughput collector uses multi threads for collecting memory in both young generation and old generation phrase. Like serial collector, throughput collector stops all applications thread until finish. Because throughput collector uses many threads for collecting memory, this collector also is called parallel collector.

Illustration for parallel collector

3. The CMS Collector: stands for Concurrent Mark-Sweeper Algorithm, or named Mostly-Concurrently Algorithm. The throughput collector although use multithread for processing heap but still have a long-pause time for full cycle. This algorithm uses multiple threads (“concurrent”) to scan through the heap (“mark”) for unused objects that can be recycled (“sweep”).

There are four steps:

  1. Initial mark:
  2. Concurrent mark:
  3. Remark:
  4. Concurrent sweeping:

4. The G1 Collector: or named Garbage First Collector.

Conclusion

Garbage Collector is a broad topic in Computer Science. There are many algorithms in this field is developed for more than 20 years and still in very active development. In future blog, I will discuss how heap memory in Java or Android organize for increasing garbage collecting speed.

Link