Garbage Collection: V8’s Orinoco

A prolonged period of time I have been thinking of deep-diving into Orinoco and writing this article, so now it’s time.

A garbage collector is an invisible warrior which fights against an army of unused objects targeting the memory of your program.

GC (garbage collection) is of great importance in long-running, data-intensive servers, where, due to its specific character, memory matters are essential.

Nonetheless, most developers don’t need to think about a GC while developing programs, but understanding some of the internal components can help them to apply optimized memory usage and relevant programming patterns.

This article covers the general theory of garbage collection to fully understand the structure, phases and certain differences and answers the main question — “how does it work?” So, let’s begin

Theory of garbage collection

Before discussing garbage collection, it makes sense to consider different allocation types used in programming languages. Let’s highlight the main 3 types: Static, Stack and, last but not least, Heap allocations. The last point arouses our interest as a garbage collector fits this allocation.

A heap is a tree-based data structure in which all the nodes of the tree are in a specific order

Heap memory has various advantages in comparison to other allocation types, for example:

  • Variable size of data structures
  • Consuming memory on demand
  • Any chunks passing around

Definitely, there arise 2 problems. You must have heard about memory leaks (that is, using more memory than required on the heap when we forget to free needed objects) and dangling pointers (that is, having less memory than required on the heap when we free objects to early) issues.

High-level abstraction programming languages eliminate manual memory management and replace it with an automatic solution which is called the garbage collection.

Garbage collection — is a form of automatic memory management.

From my point of view, the most accurate definition of garbage collection can be found here:

Garbage collection (GC), also known as automatic memory management, is the automatic recycling of dynamically allocated memory (2). Garbage collection is performed by a garbage collector which recycles memory that it can prove will never be used again. Systems and languages which use garbage collection can be described as garbage-collected.

It goes without saying, in different languages, garbage collection has a diverse structure, strategies, features, phases, as well as names. For instance, Golang doesn’t have generations (generations will be described below), in its turn, Rust doesn’t perform garbage collections in a runtime, only at the compile-time (static).

However, it’s possible to highlight the main tasks which garbage collection performs:

  • Identifying and removing unreferenced objects from a heap memory
  • Reusing already allocated memory for new objects
  • Defragmentation memory (optional task)

All these tasks can be performed in sequence or can be arbitrarily interleaved and we will see which option is used in JavaScript.

Also, it should be emphasized, that a garbage collector works only with syntactic garbage (unreachable objects). The difference between syntactic and semantic garbage can be found here.

So, we can make the first conclusion: garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain categories of bugs are eliminated or substantially reduced:

In other words, garbage collection frees the programmer from manually dealing with memory deallocation

How does the garbage collector know which objects should be deallocated? There are many ways for automatic memory managers to determine which memory is no longer required. In the main, garbage collection relies on determining which blocks are not pointed to by any program variables.

Tracing: determines which objects are unused through tracing which objects are reachable by a chain of references from certain “root” objects and considering the rest as “garbage” and collecting them.

Root object means reachable reference which is held by a register or a variable on the stack.

Algorithms used in implementation:

  • Mark-Sweep (issue — fragmentation and free-list allocator)
  • Mark-Compact (issue — up to 3x heap traversal)
  • Copying GC or Semi-space (issue — reserve half of the heap)

There is another modern garbage collection algorithm, known as the Mark-Region with the implementation — the Immix collector.

Although, tracing is a common type of GC and has various algorithms under the hood, unfortunately, all of them will not be considered as it’s rather time-consuming.

Evacuation phase

V8 uses a ‘semi-space’ design for the young generation. Surviving objects are always evacuated to a new page.

Reference counting: is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object’s reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. When the count reaches zero, the object’s memory is reclaimed.

The main issues of this collector are the cycle references, which it simply can’t handle producing the memory links and also can be mentioned the overhead on the Mutator, since every assignment and delete operations have to be overloaded to consider the Collector’s work.

Reference counting: The cycle references

The essential point which has not been mentioned yet is that we have been discussing single-threaded blocking collectors, which operate on the whole heap. This can be pretty slow if the heap is large, and there is a lot of garbage.

Due to this reason — the following logical assumption can be made — garbage collectors should definitely have performance optimizations. And it’s reasonable! So many modern production collectors involve different techniques to reduce these GC pauses and make the collectors faster. And the first one is:

The basic idea of a generational collector is to partition the heap and split it into so-called Generations and then to collect each generation independently: the smaller a generation in size, the faster the scan of this heap region.

https://v8.dev/blog/orinoco-parallel-scavenger

So once again:

instead of traversing the whole heap, we traverse only a smaller portion of it

The generational collector has an interesting conception which is called the Weak generational hypothesis, which states that most objects survive for only a short period of time.

It has been known since 1984. All details you can find here.

Most allocated objects die young

Needles to say, this optimization technology has a disadvantage — all “store pointer” operations have to be tracked which may slow down mutator. It’s called “write barrier overhead”.

There is an interesting fact that generational collector includes a moving phase when objects which got a promotion are moved to another heap region. So it can’t be used for languages exposing pointer semantics, for example — Golang. Although, there exists a non-moving generation garbage collection. More information you can find here.

It’s time to turn to the main topic of this article — Orinoco. In the Javascript world, garbage collection has a significant role because the majority of all items are objects — arrays, functions and so on. It seems to imply a certain amount of work, doesn’t it?

Besides, all the above-described collectors have STW (stop-the-word) pause. Pause when the mutator is put on pause when a collector starts its work. all the threads of the Mutator are blocked. Usually, simple STW collectors have only one collection thread and while it might seem not very performant.

In V8 it meant that the event loop was paused, frequently it was for 1/5 second, especially it felt in NodeJS — every request had a delay.

Given these facts, Orinoco’s goal has the meaning:

Free the main thread

Orinoco contains different approaches for reducing the time spent in garbage collection, that is the amount of time when the main thread is paused while GC is performing.

The main thread and helper threads work on the same task simultaneously. This is still a ‘stop-the-world’ approach, but the total pause time is now divided by the number of participating threads (plus some overhead for synchronization)

The example is below

GC tasks happen entirely in the background. The main thread is free to execute JavaScript.

To advantages can relate smaller garbage collection pauses. Mutator can continue working in a garbage collection cycle. Whereas the main issue is Barrier overhead. All pointers have to be tracked which may slow down Mutator

Small chunks of the GC task are interleaved into the main thread execution.

It’s also still a ‘stop-the-world’ approach, but by allowing JavaScript to run intermittently, it enables a garbage collector to continues its tasks and the application can still respond to user input and make progress on an animation.

The garbage collector can be interrupted by Mutator before completion.

If you are interested to know more about Orinoco, please follow the next links:

Conclusion

Finally, it may be concluded, V8 has done a great job. Orinoco has significantly reduced pause time spans, latency and page load. Let’s take a look at the official numbers:

The parallel Scavenger has reduced the main thread young generation garbage collection total time by about 20%–50%, depending on the workload. Idle-time GC can reduce Gmail’s JavaScript heap memory by 45% when it is idle. Concurrent marking and sweeping has reduced pause times in heavy WebGL games by up to 50%.

In the end, I hope this article helps you gain basic holistic knowledge about garbage collection.

Certainly, a lot of details have been skipped. For instance, GC barriers, Tri-color abstractions, Mutator and Allocator, Major and Minor GC at Orinoco, Escape analysis, etc. All of these topics are worth attention and if they arouse your interest, I will create the second part of the article and dive into them.

Senior Software Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store