Mastering data structures in Ruby — AVL Trees

Binary search trees, or BTSs for short, are data structures designed to perform fast lookups on large datasets. The subject of this post is AVL trees, a special kind of self-balancing BST named after its creators Adelson-Velskii and Landis where the height difference between the left and right subtrees (the balance factor) is always in the range (-1..1) giving us O(log2 n) lookups.

On AVL trees nodes arranged in descendant order, and they can’t contain duplicate keys. These properties are important because most of its methods rely on them.

Let’s start by looking at how lookups work.

How lookups work

On this data structure lookups are top-down recursive-descendant operations; which means that they start from the root node and move all the way down until they find the key they are looking for or run out of nodes.

Of course, inspecting the whole tree would be prohibitively expensive, but since nodes are guaranteed to be sorted, we don’t have to do that. We start from the top, descending left or right based on how our key compares to the current node’s.

If the key of the current node is higher than the key we are looking for, we move to the left; otherwise, we go to the right. In any case, we have to repeat this operation until we find the key we are looking for, or we get to the end of the branch (EOB).

An AVL Tree looks something like:

        4
/ \
/ \
2 6
/ \ / \
1 3 5 9
/\
7 11

Now let’s say we want to find the number 5.

Starting from the root node, the way to go would be:

1. Is 5 == 4?
No
2. Is 5 < 4?
No. Move right.
3. Is 5 == 6?
No
4. Is 5 < 6?
Yes. Move left.
5. Is 5 == 5
Yes. We found it!

*Note: To simplify the lookup process I’ve omitted the check for “end of the branch”.

So, to find the number 5 we only need 2 traversals from the root; the same operation on a singly linked list would require 5.

For a small number of elements it doesn’t look like a big difference; however, what is interesting about AVL trees is that lookups run O(log2 n) time; which means that they get exponentially better than linear search as the dataset grows.

For instance, looking for an element out of a million on a singly linked list could require up to 1,000,000 traversals. The same operation on AVL trees would require roughly 20!

How inserts work

Inserts and searches are tightly coupled operations. So, now that we know how searches work, let’s take a look at inserts. Or better yet, rotations, which are the interesting bits of inserts.

Rotation is a mechanism to rearrange parts of a tree to restore its balance without breaking these properties:

  • left’s key < parent’s key
  • parent’s key < right’s key

Once the rotation is completed, the balance factor of all nodes must be in the balanced range (-1..1).

Depending on the location of the node that puts the tree into an unbalanced state, we have to apply one of these rotations (LL, LR, RR, RL). Let’s take a look at each one of them.

LL (left-left)

Suppose we want to add the number 1 to the following subtree:

         /
6
/ \
3 7
/ \
2 4
/
x

By following the rules we use for lookups operations, we descend from top to bottom, moving left or right, until we find an empty spot. In this case, that spot will be the left subtree of the node that contains the number 2. Once we are there, we insert the new node.

         /
6
/ \
3 7
/ \
2 4
/
1

So far, so good; but now we have a problem. After inserting the new node, the balance factor of the subtree’s root node (6) went to +2, which means that we must balance the tree before going on.

To know which kind of rotation we have to apply, we can follow the path from the unbalanced node to the node we just added. Let’s annotate the tree to visualize this path:

          /
6
/ \
(L) 3 7
/ \
(L) 2 4
/
1

In this case, the node is rooted at the left subtree of the left subtree of the unbalance node; hence, we have to do a left-left rotation (LL).

LR (left-right)

Now let’s take a look at a left-right rotation (LR). A rotation we have to apply when the new node lies on the right subtree of the left subtree of the node that got unbalanced.

In the following three that rotation will take place if we have to insert the number 5 into the tree.

         /
6
/ \
3 7
/ \
2 4
\
5 (*)

RR (right-right)

The third kind of rotation is the right-right rotation (RR). This rotation happens when the new node is on the right subtree of the right subtree of the unbalanced node. For instance, that will happen if we insert the number 11 into the following tree.

        /
4
/ \
3 6
/ \
5 9
\
11 (*)

RL (right-left)

The last kind of rotation is the right-left rotation (RL). A rotation that happens when the new node lies on the right subtree of the left subtree of the node that got unbalanced.

After inserting the number 4 in the following tree, we must perform an RL rotation to restore its balance.

        /
2
/ \
1 6
/ \
5 9
/
4 (*)

Interface for AVL trees

The public API for this data structure is straightforward. It only has four methods and no attributes.

Methods

AVL Tree’s public methods.

Implementation details

As opposed to what we did in previous posts, this time we are going to focus the private methods of the data structure. By doing so, we will be covering important details that otherwise would be unnoticed.

Let’s start by looking at how nodes are represented on the AVL tree.

AVL Node’s attributes.

*Note: To make the code easy to follow I’m not going to use the binary tree we built on the previous post. I did a prototype using that code, and the result wasn’t ideal; AVL trees have much logic that mangled with B-Trees looked a bit clumsy to me. So I think this time it will be better if we start from scratch.

Insert

Adds a new node to the tree and reset its root. This method relies on insert_and_balance that takes care of applying the right rotations to keep the tree in balance.

The complexity of this method is O(log2 n).

def insert key, data = nil
@root = insert_and_balance(@root, key, data)
end

Insert And Balance

This method looks recursively to find the right spot for the given key. Once the spot is found and the new node is inserted, it calls the balance method to ensure that the tree remains balanced.

The complexity of this method is O(log2 n).

def insert_and_balance node, key, data = nil
return Node.new key, data unless node
    if (key < node.key) 
node.left = insert_and_balance(node.left, key, data)
elsif(key > node.key)
node.right = insert_and_balance(node.right, key, data)
else
node.data = data
node.deleted = false
end
    balance(node)
end

Balance

This method balances the subtree rooted at the specified node. This method is probably the most complex in this data structure because it’s responsible for selecting the right kind of rotation based on the “heaviness” of the tree.

A tree is considered left-heavy is its balance factor is higher than 1 or right-heavy if it’s lower than -1.

The best way to follow this method is by looking at the ASCII representations of each use-case in previous sections.

The complexity of this method is O(log2 n).

def balance node
set_height node
    if (height(node.left) - height(node.right) == 2) 
if (height(node.left.right) > height(node.left.left))
return rotate_left_right(node.left)
end
return rotate_right(node)
elsif (height(node.right) - height(node.left) == 2)
if (height(node.right.left) > height(node.right.right))
return rotate_right_left(node.right)
end
return rotate_left(node)
end
return node
end

Remove

This method finds the node to be removed and marks it as deleted. Performance wise, this is a neat way to handle deletions because the structure of the tree doesn’t change, and so we don’t have to balance it after the node is removed.

This technique is called “lazy” removal because it doesn’t do much work.

The complexity of this method is O(log2 n).

def remove key
search(key)&.deleted = true
end

Search

This method starts a top-down recursive descendant search from the current tree’s root node.

The complexity of this method is O(log2 n).

def search key
node = search_rec @root, key
return node unless node&.deleted
end

Search Rec

Searches for a key in the subtree that starts at the provided node. This method starts the search at a given node and descends (recursively) moving left or right based on the key’s values until it finds the key or gets to the end of the subtree.

The complexity of this method is O(log2 n).

def search_rec node, key
return nil unless node
return search_rec(node.left, key) if (key < node.key)
return search_rec(node.right, key) if (key > node.key)
return node
end

Set height

This method calculates and sets the height for the specified node based on the heights of their left and right subtrees.

The complexity of this method is O(1).

def set_height node
lh = height node&.left
rh = height node&.right
max = lh > rh ? lh : rh
    node.height = 1 + max
end

Rotate right

This method performs a right rotation.

The complexity of this method is O(1).

def rotate_right p
q = p.left
p.left = q.right
q.right = p
    set_height p
set_height q
    return q
end

Rotate Left

This method performs a left rotation.

The complexity of this method is O(1).

def rotate_left p
q = p.right
p.right = q.left
q.left = p
    set_height p
set_height q
    return q
end

Rotate Left Right

This method points the left subtree of the given node to the result of rotating that subtree to the left and then rotates the specified node to the right. The last rotation it’s also its return value.

The complexity of this method is O(1).

def rotate_left_right node
node.left = rotate_left(node.left)
return rotate_right(node)
end

Rotate Right Left

This method points the right subtree of the given node to the result of rotating that subtree to the right and then rotates the specified node to the left. The last rotation it’s also its return value.

The complexity of this method is O(1).

def rotate_right_left node
node.right = rotate_right(node.right)
return rotate_left(node)
end

Print

This method prints the contents of a tree.

Since it has to visit all nodes in the tree, the complexity of this method is O(n).

def print
print_rec @root, 0
end
def print_rec node, indent
unless node
puts "x".rjust(indent * 4, " ")
return
end
    puts_key node, indent
print_rec node.left, indent + 1
print_rec node.right, indent + 1
end
def puts_key node, indent
txt = node.key.to_s
if node.deleted
txt += " (D)"
puts txt.rjust(indent * 8, " ")
else
puts txt.rjust(indent * 4, " ")
end
end

When to use binary search trees

Binary search trees work well when we have to handle large datasets in applications where the read/write ratio is 10 > 1 or more. If the volume of data is small or the read/write ratio is even is better to use hash tables or even linked lists instead.

I recently used this data structure to implement the internal storage for a spreadsheet component, and it worked like a charm.

This was a rather long post that I hope you enjoy it!

You can get the source code from here.

Next time, Graphs.

Thanks for reading! Also, don’t forget to clap if you like this post :)

PS: This post is part of a series, and as we move forward, subjects that were thoroughly discussed on previous posts are sometimes glossed over (if mentioned at all). If you want to get the most of this series, I recommend you to start from the first post.