Ruby Magic: The Garbage Collector

What is Memory Management

Before getting into the topic we first need to talk about what Memory Management is. Techopedia defines memory management as a broad term that incorporates all processes and methodologies for the effective use, allocation, monitoring and management of computer memory. Memory management allows an underlying computer or operating system (OS) to dynamically distribute memory across all running processes, while ensuring optimal performance.

In hardware, memory management involves components that physically store data, such as RAM, memory caches, and flash-basedSSDs. In the OS, memory management involves the allocation (and constant reallocation) of specific memory blocks to individual programs as user demands change. At the application level, memory management ensures the availability of adequate memory for the objects and data structures of each running program at all times. Application memory management combines two related tasks, known as allocation and recycling.

External Fragmentation happens when a dynamic memory allocation algorithm allocates some memory and a small piece is left over that cannot be effectively used

Fragmentation is an excellent example of why memory management is important. This image specifically shows the waste of memory due to External Fragmentation. If too much external fragmentation occurs, the amount of usable memory can drastically reduced. The total memory space exists to satisfy a request, but it is not contiguous.

The Garbage Collector

So… What does this have to do with the Garbage Collector?!?!?!

The garbage collection is a form of automatic memory management. It attempts to reclaim memory occupied by objects that are no longer in use by the program.

September 4, 1927 — October 24, 2011 (aged 84)

Garbage collection was invented by John McCarthy around 1959 to abstract away manual memory management in Lisp.

Lisp is a computer programming language with a long history and a distinctive, fully parenthesized prefix notation. Lisp is the second-oldest high-level programming language in widespread use today. Only Fortran is older, by one year.

Objects are normally store in the heap. The heap is memory that has been set aside for allocation. There’s no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. Naturally, if a lot of objects are created, a lot of memory is used. Since memory is not infinite, we need to make sure there are no objects lingering around in the heap with no purpose. That’s why the memory which is not used anymore must be collected and recycled. More concretely the memory received through “malloc()” must be returned with “free()”.

However, it would require a lot of efforts if the management of “malloc()” and “free()” were entirely left to programmers. Especially in object oriented programs, because objects are referring each other, it is difficult to tell when to release memory.

This is where the garbage collector comes in. Garbage Collection is a feature to automatically detect and free the memory which has become unnecessary. With garbage collection, we do not need to worry about when to use “free()”.

In the following image we have object pointing to other objects; the black objects are live and currently being used. To put it more logically, the live objects are all objects which can be reached recursively via links from the root objects as the start points. These objects colored black are the necessary objects. The rest of the objects can be released.

Mark and Sweep:

GC (Garbage Collector) was first implemented in Lisp and was called mark&sweep GC. The GC of “ruby” is one type of it.

The Mark& Sweep method works similarly , the the process descried above, the difference is that it put “marks” on the root objects. Setting them as the starting points, but the it also put “marks” on all reachable objects. This is called the mark phase.

When there’s not any reachable object left, the GC check all objects in the object pool, and sweeps all objects that have not marked.

Advantages:

  • There does not need to be any (or almost any) concern for garbage collection outside the implementation of GC.
  • Cycles can also be released.

Disadvantages:

  • In order to sweep, every object must be touched at least once.
  • The load of the GC is concentrated at one point.

Stop & Copy:

Stop and Copy is very similar Mark and Sweep. We have an area where all all objects are placed (figure A), we can call this the “creation box”.

When the GC starts, it follows the links from the roots the same way as mark-and-sweep. The difference is, it then moves the objects to another area. When all the links have been followed, it discards all elements which remain in creating box, and make figure B the active box.

Advantages:

  • Compaction happens at the same time as collecting the memory
  • Since objects that reference each other move closer together, there’s more possibility of hitting the cache.

Disadvantages:

  • The object area needs to be more than twice as big
  • The positions of objects will be changed

Conclusion:

The garbage collection is a powerful tool. It takes care of Dangling pointer , Double free bugs, and Certain kinds of memory leaks. Like many other things in the computer world, it is not perfect. As a downfall to all the work it saves us, as developers, it does in fact take up a lot CPU and can slow your software.

Show your support

Clapping shows how much you appreciated Esmery Corniel’s story.