Mastering data structures in Ruby — AVL Trees

Published in

amiralles

9 min readNov 21, 2018

Binary search trees, or BTSs for short, are data structures designed to perform fast lookups on large datasets. The subject of this post is AVL trees, a special kind of self-balancing BST named after its creators Adelson-Velskii and Landis where the height difference between the left and right subtrees (the balance factor) is always in the range (-1..1) giving us O(log2 n) lookups.

On AVL trees nodes arranged in descendant order, and they can’t contain duplicate keys. These properties are important because most of its methods rely on them.

Let’s start by looking at how lookups work.

How lookups work

On this data structure lookups are top-down recursive-descendant operations; which means that they start from the root node and move all the way down until they find the key they are looking for or run out of nodes.

Of course, inspecting the whole tree would be prohibitively expensive, but since nodes are guaranteed to be sorted, we don’t have to do that. We start from the top, descending left or right based on how our key compares to the current node’s.

If the key of the current node is higher than the key we are looking for, we move to the left; otherwise, we go to the right. In any case, we have to repeat this operation until we find the key we are looking for, or we get to the end of the branch (EOB).

An AVL Tree looks something like:

        4
       / \
      /   \
     2     6
    / \   / \
   1   3 5   9
             /\
            7  11

Now let’s say we want to find the number 5.

Starting from the root node, the way to go would be:

1. Is 5 == 4?
No2. Is 5 < 4?
No. Move right.3. Is 5 == 6?
No4. Is 5 < 6?
Yes. Move left.5. Is 5 == 5
Yes. We found it!

*Note: To simplify the lookup process I’ve omitted the check for “end of the branch”.

So, to find the number 5 we only need 2 traversals from the root; the same operation on a singly linked list would require 5.

For a small number of elements it doesn’t look like a big difference; however, what is interesting about AVL trees is that lookups run O(log2 n) time; which means that they get exponentially better than linear search as the dataset grows.

For instance, looking for an element out of a million on a singly linked list could require up to 1,000,000 traversals. The same operation on AVL trees would require roughly 20!

How inserts work

Inserts and searches are tightly coupled operations. So, now that we know how searches work, let’s take a look at inserts. Or better yet, rotations, which are the interesting bits of inserts.

Rotation is a mechanism to rearrange parts of a tree to restore its balance without breaking these properties:

left’s key < parent’s key
parent’s key < right’s key

Once the rotation is completed, the balance factor of all nodes must be in the balanced range (-1..1).

Depending on the location of the node that puts the tree into an unbalanced state, we have to apply one of these rotations (LL, LR, RR, RL). Let’s take a look at each one of them.

LL (left-left)

Suppose we want to add the number 1 to the following subtree:

By following the rules we use for lookups operations, we descend from top to bottom, moving left or right, until we find an empty spot. In this case, that spot will be the left subtree of the node that contains the number 2. Once we are there, we insert the new node.

So far, so good; but now we have a problem. After inserting the new node, the balance factor of the subtree’s root node (6) went to +2, which means that we must balance the tree before going on.

To know which kind of rotation we have to apply, we can follow the path from the unbalanced node to the node we just added. Let’s annotate the tree to visualize this path:

In this case, the node is rooted at the left subtree of the left subtree of the unbalance node; hence, we have to do a left-left rotation (LL).

LR (left-right)

Now let’s take a look at a left-right rotation (LR). A rotation we have to apply when the new node lies on the right subtree of the left subtree of the node that got unbalanced.

In the following three that rotation will take place if we have to insert the number 5 into the tree.

RR (right-right)

The third kind of rotation is the right-right rotation (RR). This rotation happens when the new node is on the right subtree of the right subtree of the unbalanced node. For instance, that will happen if we insert the number 11 into the following tree.

RL (right-left)

The last kind of rotation is the right-left rotation (RL). A rotation that happens when the new node lies on the right subtree of the left subtree of the node that got unbalanced.

After inserting the number 4 in the following tree, we must perform an RL rotation to restore its balance.

Interface for AVL trees

The public API for this data structure is straightforward. It only has four methods and no attributes.

Methods

Implementation details

As opposed to what we did in previous posts, this time we are going to focus the private methods of the data structure. By doing so, we will be covering important details that otherwise would be unnoticed.

Let’s start by looking at how nodes are represented on the AVL tree.

*Note: To make the code easy to follow I’m not going to use the binary tree we built on the previous post. I did a prototype using that code, and the result wasn’t ideal; AVL trees have much logic that mangled with B-Trees looked a bit clumsy to me. So I think this time it will be better if we start from scratch.

Insert

Adds a new node to the tree and reset its root. This method relies on insert_and_balance that takes care of applying the right rotations to keep the tree in balance.