Immutable Data Structures — RRB Trees (Part-1)

7 min readJun 17, 2019

Modern Concurrent Computing friendly JVM Languages like Clojure and Scala rely heavily on Immutable Data Structures

On Functional Programming and Immutability

Oh, how simple the world used to be, we had single processors in a computer, the internet didn’t give birth to YouTube, Facebook or Instagram and software was simple due to limitations of the hardware. Alas, it couldn’t last forever!

Circa 2019, programs and software has permeated everything, from wrist watches to self-driving cars. The limits of hardware have been circumvented using multiple-processors and our need to build faster and scalable things has become exponentially high. Software is used by the doctors, lawyers, government and even for dating, clearly there must be a better way to write more reliable programs and yet have an easy to understand concept behind these programs.

In the last 50 years, computers have evolved exponentially but have we done the same? Probably not!

The mental models which helped us create software for limited hardware isn’t quite working out so well in this age and to understand the future, to solve problems cleanly we must look back into the history of computer science. Looking back we re-discover the programming paradigm of using simpler building blocks which could be tested easily, to use mathematical reasoning to minimize chances of the program doing the wrong things and to help the engineers focus on the right design tool for solving the problem reliably.

Clojure and Scala taking Functional Programming mainstream

The success of functional programming languages like Clojure and Scala have proved that the right time for the idea of functional programming has finally arrived. Both of these famous JVM languages rely heavily on the use of Purely Functional Data Structures as well as Immutability as a design philosophy embedded in the DNA of the languages itself.

Today we shall talk about one such data structure which has helped these concurrency friendly languages in being performant as well, Relaxed Radix Balanced Trees (RRB Trees).

RRB Trees: Efficient Immutable Vectors — Phil Bagwell and Tiark Rompf (2012)

We will consider the original paper written by the dynamic (or more aptly statically typed :) ) Phil Bagwell and Tiark Rompf.

The paper originates through the data structures in the Scala programming language in the EPFL — École polytechnique fédérale de Lausanne and could be found in the PapersWeLove Github repo.

The relevance of this Data Structure to Clojure and Scala is because both of these languages offer concurrency friendly Immutable Vectors as part of their standard libraries and are used by hundereds of thousands of engineers daily. The purpose of the RRB Trees is to improve the performance of the standard Immutable Vectors by making the Vector Concatenation, Insertion as well as Split operation much more performant without affecting Indexing, Updating and Iteration speeds of the original Immutable Vectors.

Let us first consider why the generic Array data structure won’t be ideal for concurrent programming. There are certain benefits of using Array such as the ability to access elements in constant O(1) time rather than linear O(n) time, also the dis-joint parts of the array i.e. sub-arrays could be effectively processed in a parallel computing manner. However, the problem with this Data Structure is that it relies upon mutability of in-place element updates which rests the responsibility to immutability upon the programmer. Unfortunately, time and again, humans have proved not to be very disciplined given this responsibility of maintaining the state of a program, which has lead to many a million dollar bugs.

You wouldn’t want to loose a million bucks, right? Let’s go immutable :)

Back in the immutable world, an efficient counter-part to the Array data structure i.e. an indexable ordered sequence of values is a bit tedious to create as a simplistic direct translation to immutable version would end up having an unacceptable linear cost for updating individual elements and thus decreasing performance of the data structure.

In Clojure the immutable vectors is a critical part of the language and it’s creator Rich Hickey, relied on the Hashed Array Mapped Tries (HAMT) also called as Ideal Hash Tries for implementing not one but two of the core data structures namely, vector and hash-map. The resultant design provides efficient iteration and single-element append in constant time, indexing in (l/5 logN) time and updates in (32/5 logN) time.

Using 32-wide arrays as tree nodes makes the data structure cache friendly. An indexed update incurs only O(1/5 logN) indirect memory accesses, meaning that, for practical purposes all operations incur a “constant time” cost. However parallel processing requires efficient vector concatenation, splits and inserts at a given index, which are not easily supported by the structure. The work presented in this paper extends the underlying vector structure to support concatenation and inserts in rather than linear time without compromising the per O(logN) performance of the existing operations. This new data structure lends itself to more efficient parallelization of common types of comprehensions. A vector can be split into partitions that can then be evaluated in parallel. For many common operations such as filter the size of the individual partition results is not known a priori. The resulting sub-vectors can be concatenated to return a result vector without linear copying. Thus the benefits of parallel processing are not lost in re-assembling the sub-vectors.

This is a basic vector structure with a 4-way branching structure. Notice carefully, that the nodes at different levels of the tree are connected by the index value 1 unless there are more leaves in the sub-tree.

This relationship is more clearly explained by the diagram below. This figure illustrates the basic structure of an RRB Tree. The tree node A comprises an array of pointers that point to sub-trees, C, D and E. Associated with this array is the range array that contains the accumulated total number of leaf items in these sub-trees. For convenience we will call the pointer and its associated range a slot. In the example, slot 0 points to node C and is said to contain 3
sub-trees, which are leaf items in this case.

Now you might ask why have we specifically chosen a 4-way tree? — That’s a good question to ask!

The reason lies in the fact that upon empirical analysis between the time taken for indexing and updates on the data structure the optimum value comes out to be 4 and we do care enough about performance to measure this against Nanoseconds scale!

Concatenation

Now let us consider the core of the algorithm which concentrates on the re-assembling part of the parallel processing pipeline and let’s see how it makes the algorithm more efficient than the current vector implementation.

The naive approach to concatenate these into one vector requires a “shift left” and copying of all the nodes from the right hand vector into the appropriate positions
in the concatenated vector, a process linear in the size of the right hand vector. Alternatively one can traverse the right hand vector
and add the items to the left vector, again a linear process.

But the RRB Tree algorithm proposes that we could also achieve the same results in the worst case cost of concatenation proportional to O(log N) simply by relying on the pointer to the sub-trees which are stored in the sub-vectors from the level above and starting the concatenation process in either a bottom-up or top-down manner.

First we move down the right hand edge of the left tree and the left hand edge of the right tree. Then we move up one level. In order to have a well formed tree we must rebalance the nodes enclosed in the box to conform to the invariant. This requires that we must copy branch entries from the right to the left until each slot count lies between m-1 and m. Further we need to maintain the sequential order. In the example, we need to copy the item in the 3rd node and the three items in the 4th node into a new node that now meets the invariant. However, in general several or even all the nodes may need to be copied before they meet the invariant.

The original Scala and Clojure Vector implementation uses internal nodes with an array carrying the 32 way branch. In this implementation the array is increased by one when a node is converted to an RRB Tree node. The zeroth element carries a reference to the array containing range values. There is no overhead on vectors that have not been concatenated. Further by using a separate range array, rather than a range/pointer object the speed of iteration, a common use case, is unaffected.

After concatenation on some of the nodes the zeroth element
will point to the array of range values. Since only internal nodes
carry this extra element the extra memory overhead is close to one
in m² or 1 in 1024 and can be considered negligible.

In Part-2 we will use the clojure core.rrb library based exploration of RRB-Tree performance

In the Part-1, I’ve glossed over the mathematical details in favor of conceptual understanding and moving forward we’ll study the RRB-Tree paper based on the Clojure implementation of the concept aptly called core.rrb-vector I’ve also created an accompanying repository on github to showcase the benchmarks between the native clojure vector and rrb-vector.