Red-Black Trees in C++

Published in

smucs

4 min readApr 7, 2021

Introduction

I was first introduced to red-black trees by a friend of mine who studies Computer Science at another university. I was curious about how it functioned both on a code level and an efficiency level compared to the other tree data structures that I was familiar with in the past.

History

The origin of the red-black tree as we know it today dates back to 1972, when the B-tree data structure was invented by German Computer Scientist Rudolf Bayer. The B-tree was the first self balancing tree of its kind, and it made searching easier by ensuring that all paths from root to leaf had the same number of nodes. Six years later, Computer Scientists Leonidas J. Guibas and Robert Sedgewick created the first rendition of the red-black tree based on what was referred to at the time as a symmetric binary B-tree. The latest change came in 2008, when Sedgewick proposed the left-leaning red-black tree in order to simplify the insert and delete operations of the tree. The new insertion algorithm was reduced from 46 lines of java code into just 33.

Fun fact: the color red was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC

How it Works

A Red-black tree is a type of self-balancing binary search tree. It is comprised of nodes that contain its data, a pointer to its parent node, two pointers to its children, and an extra bit that specifies which color it is.

Some of the requirements of the tree include:

Every node has a color that is either red or black
The root node is always black
All null leaves are black
A red node cannot have a red parent or any red children
Every path from a node to any of its descendants NULL nodes has the same number of black nodes.

Below is a visual representation of how a hypothetical tree would look:

When the tree is modified, the tree is rearranged and “repainted” to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently. The re-balancing is not perfect, but guarantees searching in O(log n) time, where n is the number of nodes of the tree. The insertion and deletion operations, along with the tree rearrangement and recoloring, are also performed in O(log n) time.

Here is abstracted overview of how the tree rearrangement algorithm works:

Let x be a newly inserted node.
Perform a standard binary search tree insertion and assign the color of newly inserted node to red.
If x is the root, change the color of x to black. If not, do the following if the color of x’s parent is not black and x is not the root.

If x’s uncle is red (Note: the grandparent node must be black)

Change the color of the parent and the uncle to black, and the grandparent to red.
Swap x and x’s grandparent, repeat previous for new x.

If x’s uncle is black, then there can be four configurations for x, for simplification lets set x’s parent = p and x’s grandparent = g:

Left Left Case: (p is left child of g and x is left child of p)
Left Right Case: (p is left child of g and x is the right child of p)
Right Right Case: (See left left case)
Right Left Case: (See left right)

Code Examples

I decided to test a red-black table implementation against a similar data structure that I have a lot of familiarity with, the AVL tree. For my test, I generated random data sets of a fixed sized and timed how long the insertion of every element in that data set would take for each data structure. Using the std::chrono library for timing, here were my results:

From this chart you can see that for all significantly large data sets, red-black trees are more time efficient when it comes to insertion. This aligns with the overall consensus that red-black trees more time efficient when it comes to insertion. The time complexity for this data structure is profoundly fast.

What I’ve learned

From all my research, I have learned about a data structure with such a unique implementation. I never previously thought about how to implement a tree like this other than using the height like an AVL tree does. It is a very great way to store and retrieve giant sets of data. I can now see why the C++ standard library uses a red-black tree implementation in most of their self-balancing BST functions. I greatly enjoyed researching and testing out this data structure and I hope to learn about more data structures like this in the future.

References Cited

Repository for my test

https://github.com/MikeCuzzo/RedBlackTreeTimer/tree/main/RedBlackTreeTimer