Garbage Collection in Java
In order to understand this blog first you need to have a fair understanding of how Java memory model works, you can refer to a blog written on Java memory model or else you can refer to online resources.
Garbage collection is a memory management technique implemented to reclaim the memory which is not being used by the application to avoid memory leaks and to avoid shortage of memory while running a Java program.
Sometimes while execution of a java program you must have come across Java heap space run out of memory exception . JVM is allotted some memory space consisting of heap and stack , heap is where dynamic allocation of memory happens. All objects are allocated on the heap area managed by the JVM. Every item that the developer uses is treated this way, including class objects, static variables, and even the code itself. This means that it’s size grows but it can’t exceed the maximum memory limit allowed for JVM to run. In such scenario garbage collection becomes crucial as it reclaims the unused memory , by reclaiming memory I mean taking the ownership of that memory block to itself and not returning back to Operating System.
Many people think garbage collection collects and discards dead objects. In reality, Java garbage collection is doing the opposite! Live objects are tracked and everything else designated garbage. This fundamental misunderstanding can lead to many performance problems.
What is an alive and dead object ?
An object is considered to be alive when it is being referenced by application . As soon as reference to an object is lost and the application code is not able to reach it, it is considered to be dead. Internally on object tree is formed. As simple as this sounds, it raises a question: what is the first reference in the tree?
Every object tree has one or more roots, as long as those roots are reachable the whole tree is reachable. So the question comes when are the roots reachable? Special objects called garbage-collection roots are always reachable so as the objects which has these as it’s root.
These roots can be anything like
- Thread stack having a local variable which contains a reference to an object which in turn can have reference to other objects and so on forming a tree
- Static variables are referenced by a class making a class by default a GC root , so when a class is garbage collected the reference are also collected.
- Active Java threads are also GC roots.
So it’s really a matter of taking every starting point (every local variable, globals, statics, everything in other threads and stack frames) — every root — and recursively following all the references to make up a list of all the “live” objects: objects which are in use and unsuitable for deletion. Everything else is garbage, waiting to be collected.
After realizing what is a GC root and how is a tree formed we can look into the algorithm used by GC to reclaim the memory. GC uses Mark and Sweep algorithm to reclaim the unused memory.
It’s pretty intuitive that this algorithm has a two step process :-
- The JVM runs this algorithm intermittently to traverse and mark the objects as alive which are reachable.
- All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.
It might happen that some of the objects which are not used by the application are still reachable simply because a developer forgot to dereference them. Such a logical memory leak can’t be deleted by any software.
Lastly I would like to mention that though garbage collection has it’s advantage it comes at the price of performance as it is run by JVM intermittently.