# Mastering data structures in Ruby — AVL Trees

**Binary search trees**, or **BTS**s for short, are data structures designed to perform **fast lookups on large datasets**. The subject of this post is **AVL trees**, a special kind of **self-balancing BST** named after its creators Adelson-Velskii and Landis where the height difference between the left and right subtrees (the balance factor) is always in the range (-1..1) giving us** O(log2 n) lookups**.

On AVL trees **nodes arranged in descendant order, and they can’t contain duplicate keys**. These properties are important because most of its methods rely on them.

Let’s start by looking at how lookups work.

# How lookups work

On this data structure lookups are ** top-down recursive-descendant operations;** which means that they start from the root node and move all the way down until they find the key they are looking for or run out of nodes.

Of course, inspecting the whole tree would be prohibitively expensive, but since nodes are guaranteed to be sorted, we don’t have to do that. We start from the top, descending left or right based on how our key compares to the current node’s.

If the key of the current node is higher than the key we are looking for, we move to the left; otherwise, we go to the right. In any case, we have to repeat this operation until we find the key we are looking for, or we get to the end of the branch (EOB).

An AVL Tree looks something like:

` 4`

/ \

/ \

2 6

/ \ / \

1 3 5 9

/\

7 11

Now let’s say we want to find the number 5.

Starting from the root node, the way to go would be:

1. Is 5 == 4?

No2. Is 5 < 4?

No. Move right.3. Is 5 == 6?

No4. Is 5 < 6?

Yes. Move left.5. Is 5 == 5

Yes. We found it!

**Note: To simplify the lookup process I’ve omitted the check for “end of the branch”.*

So, to find the number 5 we only need 2 traversals from the root; the same operation on a singly linked list would require 5.

For a small number of elements it doesn’t look like a big difference; however, what is interesting about AVL trees is that lookups run O(log2 n) time; which means that ** they get exponentially better than linear search as the dataset grows**.

For instance, looking for an element out of a million on a singly linked list could require up to **1,000,000** traversals. The same operation on AVL trees would require roughly **20!**

# How inserts work

Inserts and searches are tightly coupled operations. So, now that we know how searches work, let’s take a look at inserts. Or better yet, **rotations**, which are the interesting bits of inserts.

*Rotation is a mechanism to rearrange parts of a tree to restore its balance without breaking these properties:*

- left’s key < parent’s key
- parent’s key < right’s key

Once the rotation is completed, the **balance factor** of all nodes must be in the balanced range (-1..1).

Depending on the location of the node that puts the tree into an unbalanced state, we have to apply one of these rotations (LL, LR, RR, RL). Let’s take a look at each one of them.

## LL (left-left)

Suppose we want to add the number 1 to the following subtree:

` /`

6

/ \

3 7

/ \

2 4

/

x

By following the rules we use for lookups operations, we descend from top to bottom, moving left or right, until we find an empty spot. In this case, that spot will be the left subtree of the node that contains the number 2. Once we are there, we insert the new node.

` /`

6

/ \

3 7

/ \

2 4

/

1

So far, so good; but now we have a problem. After inserting the new node, the balance factor of the subtree’s root node (6) went to +2, which means that we must balance the tree before going on.

To know which kind of rotation we have to apply, we can follow the path from the unbalanced node to the node we just added. Let’s annotate the tree to visualize this path:

` /`

6

/ \

**(L)** 3 7

/ \

**(L)** 2 4

/

1

In this case, the node is rooted at the* left subtree*** of the left subtree** of the unbalance node; hence, we have to do a

**left-left rotation**

**(LL)**.

## LR (left-right)

Now let’s take a look at a **left-right rotation (LR).** A rotation we have to apply when the new node lies on the ** right subtree of the left subtree **of the node that got unbalanced.

In the following three that rotation will take place if we have to insert the number 5 into the tree.

` /`

6

/ \

3 7

/ \

2 4

\

5 (*)

## RR (right-right)

The third kind of rotation is the **right-right rotation (RR)**. This rotation happens when the new node is on ** the right subtree of the right subtree** of the unbalanced node. For instance, that will happen if we insert the number 11 into the following tree.

` /`

4

/ \

3 6

/ \

5 9

\

11 (*)

## RL (right-left)

The last kind of rotation is the **right-left rotation (RL)**. A rotation that happens when the new node lies on ** the right subtree of the left subtree** of the node that got unbalanced.

After inserting the number 4 in the following tree, we must perform an RL rotation to restore its balance.

` /`

2

/ \

1 6

/ \

5 9

/

4 (*)

# Interface for AVL trees

The public API for this data structure is straightforward. It only has four methods and no attributes.

## Methods

# Implementation details

As opposed to what we did in previous posts, this time we are going to focus the ** private** methods of the data structure. By doing so, we will be covering important details that otherwise would be unnoticed.

Let’s start by looking at how nodes are represented on the AVL tree.

*Note: To make the code easy to follow I’m not going to use the binary tree we built on the previous post. I did a prototype using that code, and the result wasn’t ideal; AVL trees have much logic that mangled with B-Trees looked a bit clumsy to me. So I think this time it will be better if we start from scratch.

## Insert

Adds a new node to the tree and reset its root. This method relies on *insert_and_balance* that takes care of applying the right rotations to keep the tree in balance.

**The complexity of this method is O(log2 n).**

`def insert key, data = nil`

@root = insert_and_balance(@root, key, data)

end

## Insert And Balance

This method looks recursively to find the right spot for the given key. Once the spot is found and the new node is inserted, it calls the balance method to ensure that the tree remains balanced.

**The complexity of this method is O(log2 n).**

def insert_and_balance node, key, data = nil

return Node.new key, data unless node if (key < node.key)

node.left = insert_and_balance(node.left, key, data)

elsif(key > node.key)

node.right = insert_and_balance(node.right, key, data)

else

node.data = data

node.deleted = false

end balance(node)

end

## Balance

This method balances the subtree rooted at the specified node. This method is probably the most complex in this data structure because it’s responsible for selecting the right kind of rotation based on the “heaviness” of the tree.

*A tree is considered left-heavy is its balance factor is higher than 1 or right-heavy if it’s lower than -1.*

The best way to follow this method is by looking at the ASCII representations of each use-case in previous sections.

**The complexity of this method is O(log2 n).**

**Update: **The original version of this method had an error that was corrected by Cédric Boutillier. **Thanks, Cédric!**

(I leave a comment with the original version so you can appreciate the differences.)

`def balance node`

set_height node

if (height(node.left) - height(node.right) == 2)

if (height(node.left.right) > height(node.left.left))

#return rotate_left_right(node.left) <- Wrong.

return rotate_left_right(node)

end

return rotate_right(node)

elsif (height(node.right) - height(node.left) == 2)

if (height(node.right.left) > height(node.right.right))

#return rotate_right_left(node.right) <- Also wrong.

return rotate_right_left(node)

end

return rotate_left(node)

end

return node

end

## Remove

This method finds the node to be removed and marks it as deleted. Performance wise, this is a neat way to handle deletions because the structure of the tree doesn’t change, and so we don’t have to balance it after the node is removed.

This technique is called “lazy” removal because it doesn’t do much work.

**The complexity of this method is O(log2 n).**

`def remove key`

search(key)&.deleted = true

end

## Search

This method starts a** top-down recursive descendant search** from the current tree’s root node.

**The complexity of this method is O(log2 n).**

`def search key`

node = search_rec @root, key

return node unless node&.deleted

end

## Search Rec

Searches for a key in the subtree that starts at the provided node. This method starts the search at a given node and descends (recursively) moving left or right based on the key’s values until it finds the key or gets to the end of the subtree.

**The complexity of this method is O(log2 n).**

`def search_rec node, key`

return nil unless node

return search_rec(node.left, key) if (key < node.key)

return search_rec(node.right, key) if (key > node.key)

return node

end

## Set height

This method calculates and sets the height for the specified node based on the heights of their left and right subtrees.

**The complexity of this method is O(1).**

def set_height node

lh = height node&.left

rh = height node&.right

max = lh > rh ? lh : rh node.height = 1 + max

end

## Rotate right

This method performs a right rotation.

**The complexity of this method is O(1).**

def rotate_right p

q = p.left

p.left = q.right

q.right = p set_height p

set_height q return q

end

## Rotate Left

This method performs a left rotation.

**The complexity of this method is O(1).**

def rotate_left p

q = p.right

p.right = q.left

q.left = p set_height p

set_height q return q

end

## Rotate Left Right

This method points the left subtree of the given node to the result of rotating that subtree to the left and then rotates the specified node to the right. The last rotation it’s also its return value.

**The complexity of this method is O(1).**

`def rotate_left_right node`

node.left = rotate_left(node.left)

return rotate_right(node)

end

## Rotate Right Left

This method points the right subtree of the given node to the result of rotating that subtree to the right and then rotates the specified node to the left. The last rotation it’s also its return value.

**The complexity of this method is O(1).**

`def rotate_right_left node`

node.right = rotate_right(node.right)

return rotate_left(node)

end

This method prints the contents of a tree.

Since it has to visit all nodes in the tree, **the complexity of this method is O(n)**.

def print

print_rec @root, 0

enddef print_rec node, indent

unless node

puts "x".rjust(indent * 4, " ")

return

end puts_key node, indent

print_rec node.left, indent + 1

print_rec node.right, indent + 1

enddef puts_key node, indent

txt = node.key.to_s

if node.deleted

txt += " (D)"

puts txt.rjust(indent * 8, " ")

else

puts txt.rjust(indent * 4, " ")

end

end

# When to use binary search trees

Binary search trees work well when we have to handle large datasets in applications where the read/write ratio is 10 > 1 or more. If the volume of data is small or the read/write ratio is even is better to use hash tables or even linked lists instead.

I recently used this data structure to implement the internal storage for a spreadsheet component, and it worked like a charm.

This was a rather long post that I hope you enjoy it!

You can get the source code from here.

Next time, **Graphs**.

**Thanks for reading!** Also, *don’t forget to clap if you like this post :)*

PS: This post is part of a series, and as we move forward, subjects that were thoroughly discussed on previous posts are sometimes glossed over (if mentioned at all). If you want to get the most of this series, I recommend you to start from the first post.

*PS2: If you are a member of the Kindle Unlimited program, you can read a bit polished version of this series **for free** on your Kindle device.*