Mastering data structures in Ruby — AVL Trees

Binary search trees, or BTSs for short, are data structures designed to perform fast lookups on large datasets. The subject of this post is AVL trees, a special kind of self-balancing BST named after its creators Adelson-Velskii and Landis where the height difference between the left and right subtrees (the balance factor) is always in the range (-1..1) giving us O(log2 n) lookups.

On AVL trees nodes arranged in descendant order, and they can’t contain duplicate keys. These properties are important because most of its methods rely on them.

Let’s start by looking at how lookups work.

How lookups work

On this data structure lookups are top-down recursive-descendant operations; which means that they start from the root node and move all the way down until they find the key they are looking for or run out of nodes.

Of course, inspecting the whole tree would be prohibitively expensive, but since nodes are guaranteed to be sorted, we don’t have to do that. We start from the top, descending left or right based on how our key compares to the current node’s.

If the key of the current node is higher than the key we are looking for, we move to the left; otherwise, we go to the right. In any case, we have to repeat this operation until we find the key we are looking for, or we get to the end of the branch (EOB).

An AVL Tree looks something like:

``        4       / \      /   \     2     6    / \   / \   1   3 5   9             /\            7  11``

Now let’s say we want to find the number 5.

Starting from the root node, the way to go would be:

``1. Is 5 == 4?No``
``2. Is 5 < 4?No. Move right.``
``3. Is 5 == 6?No``
``4. Is 5 < 6?Yes. Move left.``
``5. Is 5 == 5Yes. We found it!``

*Note: To simplify the lookup process I’ve omitted the check for “end of the branch”.

So, to find the number 5 we only need 2 traversals from the root; the same operation on a singly linked list would require 5.

For a small number of elements it doesn’t look like a big difference; however, what is interesting about AVL trees is that lookups run O(log2 n) time; which means that they get exponentially better than linear search as the dataset grows.

For instance, looking for an element out of a million on a singly linked list could require up to 1,000,000 traversals. The same operation on AVL trees would require roughly 20!

How inserts work

Inserts and searches are tightly coupled operations. So, now that we know how searches work, let’s take a look at inserts. Or better yet, rotations, which are the interesting bits of inserts.

Rotation is a mechanism to rearrange parts of a tree to restore its balance without breaking these properties:

• left’s key < parent’s key
• parent’s key < right’s key

Once the rotation is completed, the balance factor of all nodes must be in the balanced range (-1..1).

Depending on the location of the node that puts the tree into an unbalanced state, we have to apply one of these rotations (LL, LR, RR, RL). Let’s take a look at each one of them.

LL (left-left)

Suppose we want to add the number 1 to the following subtree:

``         /        6       / \      3   7     / \    2   4   /  x``

By following the rules we use for lookups operations, we descend from top to bottom, moving left or right, until we find an empty spot. In this case, that spot will be the left subtree of the node that contains the number 2. Once we are there, we insert the new node.

``         /        6       / \      3   7     / \    2   4   /  1``

So far, so good; but now we have a problem. After inserting the new node, the balance factor of the subtree’s root node (6) went to +2, which means that we must balance the tree before going on.

To know which kind of rotation we have to apply, we can follow the path from the unbalanced node to the node we just added. Let’s annotate the tree to visualize this path:

``          /         6        / \   (L) 3   7      / \ (L) 2   4    /   1``

In this case, the node is rooted at the left subtree of the left subtree of the unbalance node; hence, we have to do a left-left rotation (LL).

LR (left-right)

Now let’s take a look at a left-right rotation (LR). A rotation we have to apply when the new node lies on the right subtree of the left subtree of the node that got unbalanced.

In the following three that rotation will take place if we have to insert the number 5 into the tree.

``         /        6       / \      3   7     / \    2   4         \          5 (*)``

RR (right-right)

The third kind of rotation is the right-right rotation (RR). This rotation happens when the new node is on the right subtree of the right subtree of the unbalanced node. For instance, that will happen if we insert the number 11 into the following tree.

``        /       4      / \     3   6        / \       5   9            \            11 (*)``

RL (right-left)

The last kind of rotation is the right-left rotation (RL). A rotation that happens when the new node lies on the right subtree of the left subtree of the node that got unbalanced.

After inserting the number 4 in the following tree, we must perform an RL rotation to restore its balance.

``        /       2      / \     1   6        / \       5   9      /     4 (*)``

Interface for AVL trees

The public API for this data structure is straightforward. It only has four methods and no attributes.

Implementation details

As opposed to what we did in previous posts, this time we are going to focus the private methods of the data structure. By doing so, we will be covering important details that otherwise would be unnoticed.

Let’s start by looking at how nodes are represented on the AVL tree.

*Note: To make the code easy to follow I’m not going to use the binary tree we built on the previous post. I did a prototype using that code, and the result wasn’t ideal; AVL trees have much logic that mangled with B-Trees looked a bit clumsy to me. So I think this time it will be better if we start from scratch.

Insert

Adds a new node to the tree and reset its root. This method relies on insert_and_balance that takes care of applying the right rotations to keep the tree in balance.

The complexity of this method is O(log2 n).

``def insert key, data = nil    @root = insert_and_balance(@root, key, data)end``

Insert And Balance

This method looks recursively to find the right spot for the given key. Once the spot is found and the new node is inserted, it calls the balance method to ensure that the tree remains balanced.

The complexity of this method is O(log2 n).

``def insert_and_balance node, key, data = nil    return Node.new key, data unless node``
``    if (key < node.key)         node.left = insert_and_balance(node.left, key, data)    elsif(key > node.key)         node.right = insert_and_balance(node.right, key, data)    else         node.data    = data        node.deleted = false    end``
``    balance(node)end``

Balance

This method balances the subtree rooted at the specified node. This method is probably the most complex in this data structure because it’s responsible for selecting the right kind of rotation based on the “heaviness” of the tree.

A tree is considered left-heavy is its balance factor is higher than 1 or right-heavy if it’s lower than -1.

The best way to follow this method is by looking at the ASCII representations of each use-case in previous sections.

The complexity of this method is O(log2 n).

``def balance node    set_height node``
``    if (height(node.left) - height(node.right) == 2)         if (height(node.left.right) > height(node.left.left))             return rotate_left_right(node.left)        end        return rotate_right(node)    elsif (height(node.right) - height(node.left) == 2)         if (height(node.right.left) > height(node.right.right))             return rotate_right_left(node.right)        end        return rotate_left(node)    end    return nodeend``

Remove

This method finds the node to be removed and marks it as deleted. Performance wise, this is a neat way to handle deletions because the structure of the tree doesn’t change, and so we don’t have to balance it after the node is removed.

This technique is called “lazy” removal because it doesn’t do much work.

The complexity of this method is O(log2 n).

``def remove key    search(key)&.deleted = trueend``

Search

This method starts a top-down recursive descendant search from the current tree’s root node.

The complexity of this method is O(log2 n).

``def search key    node = search_rec @root, key    return node unless node&.deletedend``

Search Rec

Searches for a key in the subtree that starts at the provided node. This method starts the search at a given node and descends (recursively) moving left or right based on the key’s values until it finds the key or gets to the end of the subtree.

The complexity of this method is O(log2 n).

``def search_rec node, key    return nil unless node    return search_rec(node.left, key)  if (key < node.key)    return search_rec(node.right, key) if (key > node.key)    return node        end``

Set height

This method calculates and sets the height for the specified node based on the heights of their left and right subtrees.

The complexity of this method is O(1).

``def set_height node    lh  = height node&.left    rh  = height node&.right    max = lh > rh ? lh : rh``
``    node.height = 1 + maxend``

Rotate right

This method performs a right rotation.

The complexity of this method is O(1).

``def rotate_right p    q       = p.left    p.left  = q.right    q.right = p``
``    set_height p    set_height q``
``    return qend``

Rotate Left

This method performs a left rotation.

The complexity of this method is O(1).

``def rotate_left p    q       = p.right    p.right = q.left    q.left  = p``
``    set_height p    set_height q``
``    return qend``

Rotate Left Right

This method points the left subtree of the given node to the result of rotating that subtree to the left and then rotates the specified node to the right. The last rotation it’s also its return value.

The complexity of this method is O(1).

``def rotate_left_right node    node.left = rotate_left(node.left)    return rotate_right(node)end``

Rotate Right Left

This method points the right subtree of the given node to the result of rotating that subtree to the right and then rotates the specified node to the left. The last rotation it’s also its return value.

The complexity of this method is O(1).

``def rotate_right_left node    node.right = rotate_right(node.right)    return rotate_left(node)end``

Print

This method prints the contents of a tree.

Since it has to visit all nodes in the tree, the complexity of this method is O(n).

``def print    print_rec @root, 0end``
``def print_rec node, indent    unless node        puts "x".rjust(indent * 4, " ")        return    end``
``    puts_key node, indent    print_rec node.left, indent + 1    print_rec node.right, indent + 1end``
``def puts_key node, indent    txt = node.key.to_s    if node.deleted        txt += " (D)"         puts txt.rjust(indent * 8, " ")    else        puts txt.rjust(indent * 4, " ")    endend``

When to use binary search trees

Binary search trees work well when we have to handle large datasets in applications where the read/write ratio is 10 > 1 or more. If the volume of data is small or the read/write ratio is even is better to use hash tables or even linked lists instead.

I recently used this data structure to implement the internal storage for a spreadsheet component, and it worked like a charm.

This was a rather long post that I hope you enjoy it!

You can get the source code from here.

Next time, Graphs.

Thanks for reading! Also, don’t forget to clap if you like this post :)

PS: This post is part of a series, and as we move forward, subjects that were thoroughly discussed on previous posts are sometimes glossed over (if mentioned at all). If you want to get the most of this series, I recommend you to start from the first post.