Binary Search Trees

Sounds familiar? but we’re going to take a closer look.

A binary search tree (BST) is a binary tree in a symmetric order, where each node has a key (and an associated value).

A binary tree means it consists of nodes, and each node has at most two children(left and right child).

While being in a symmetric order is that every node is larger than all the nodes in the left subtree, and smaller than the keys in the right subtree.

It’s used to implement the symbol tables (associative arrays), and it’s the key that would be used to sort the nodes accordingly.

A Binary Search Tree (BST) — algs4.cs.princeton.edu

Binary Search Trees

We are going to implement the symbol tables using binary search tree. So, The main operations of a binary search tree would be the same; get, put, delete, contains & isEmpty.

Implementation

We’ll start by the binary search tree (BST) class. It has a reference variable called “root” points to the root node.

The Key and Value type is any generic type. And, for keys, they are Comparable. So, we can use compareTo() method, to compare whether one key is less than the other, or greater than, or equal.

public class BST<Key extends Comparable<Key>, Value> {
private Node root = null; // root of BST
}

Then, we have an inner class called “Node” to represent each node within the tree. Each node has a key, a value, a reference to the left and right child node, and an integer “count” the number of nodes in the subtree rooted at the node.

The number of nodes in the subtree rooted at every node — algs4.cs.princeton.edu
private class Node {
private Key key; // sorted by key
private Value val; // associated data
private Node left, right; // left and right subtrees
private int count; // number of nodes in subtree

public Node(Key key, Value val, int count) {
this.key = key;
this.val = val;
this.count = count;
}
}

— get

It searches for a node using it’s key recursively, and returns it’s value.

It starts from the root node, if it’s less than, then go left, if it’s greater than, go right, and if it’s equal, we are done; search hit. On the other hand, If the tree is empty, or we reached null link, we have a search miss.

Searching for a node in a BST — algs4.cs.princeton.edu
// searches for a node with a given key, and returns associated val.
public Value get(Key key) {
return get(root, key);
}
// A helper method to search for a node given a starting Node
private Value get(Node current, Key key) {
if (current == null) return null; // search miss

int cmp = key.compareTo(current.key);
if (cmp < 0) return get(current.left, key); // go left
else if (cmp > 0) return get(current.right, key); // go right
else return current.val; // search hit
}
From now on, during our implementation for symbol table using binary search tree, It might be the case when key is null, at this point, you may throw an exception.

— put

Insert a node of a key, and a value in the tree. It searches the tree recursively until it finds a null link, and all that we need to do is replace that link with a new node.

If the key already exists, it updates the value (avoid duplicates). It removes the node from the tree if the value is null.

Inserting a node in a BST — algs4.cs.princeton.edu
public void put(Key key, Value val) {
if (val == null) { // remove key from table if value is null
delete(key);
return;
}
root = put(root, key, val);
}
// A helper method to insert a node given a starting Node
private Node put(Node current, Key key, Value val) {
if (current == null) return new Node(key, val, 1);

// if key exists, overrides the old value (avoid duplicates)
int cmp = key.compareTo(current.key);
if (cmp < 0) current.left = put(current.left, key, val);
else if (cmp > 0) current.right = put(current.right, key, val);
else current.val = val;

// update number of nodes rooted at Node current
current.count = 1 + size(current.left) + size(current.right);
return current;
}

— delete

We’ll skip it for now, but, we’ll come back to it later (see below).

— contains

// Is the given key exists?
public boolean contains(Key key) { return get(key) != null; }

— Is the tree empty?

When the tree is empty, the root node will be null, and hence number of nodes rooted at the root node is 0; no nodes in the tree.

// Is the table empty?
public boolean isEmpty() { return size(root) == 0; }
// A helper method to get number of nodes rooted at Node current
private int size(Node current) {
if (current == null) return 0;
else return current.count;
}

Analysis

Symbol Table Implementations Analysis — algs4.cs.princeton.edu

The cost is the number of compares, which equals to the depth of a node in the tree + 1. The best case for searching is O(1); the root maybe the answer.

The running times of get and put operations equals to the number of compares, which equals to to the depth of a node in the tree + 1.

The depth of the tree depends on the shapes of the trees, which, in turn, depends on the order in which keys are inserted.

  • The worst case when the keys are inserted in descending order; max tree depth is O(N).
  • The best case when the tree is perfectly balanced; max tree depth is O(LogN).

If keys are inserted in randomly order, the number of compares in search or insert operation is ~ (2 LogN) on average case, while the worst case is about ~ (4.311 LogN) (proposed by Reed, 2003). There is a small chance for the height to be ~ (N) in the worst case if the keys are inserted in random order.

The shapes of the BST — algs4.cs.princeton.edu

Unlike Binary Heaps, where all levels are full (almost except the last one).

Balanced binary tree means for every (non leaf) node, the left and right subtrees’ heights differ by at most one, while perfectly balanced means all leaf nodes have the same depth or same level; every path from the root to the null link has the same length.

Ordered Operations

An important reason for why binary search trees are widely used is that they allow us to keep the keys in order.

So, using the binary search tree implementation for symbol tables, where keys are Comparable, we can support extra operations like: min, max, floor, ceiling, rank, select, …etc.

— Min and Max

We keep moving to the left from the root to find the minimum, and to the right for the maximum.

// get the smallest key
public Key min() {
if (isEmpty()) throw new NoSuchElementException("...");
return min(root).key;
}
// A helper method to get the smallest node starting from Node curre
private Node min(Node current) {
if (current.left == null) return current;
else return min(current.left);
}
// get the largest key
public Key max() {
if (isEmpty()) throw new NoSuchElementException("...");
return max(root).key;
}
// A helper method to get the largest node starting from Node curren
private Node max(Node current) {
if (current.right == null) return current;
else return max(current.right);
}

— Floor and Ceiling

For finding the floor, If current node is greater than key k, keep going to left. If current node is smaller than k, go right. Whenever we go right, the current node is the floor, until we go right again.

Computing the floor in a BST — algs4.cs.princeton.edu
public Key floor(Key key) {
if (isEmpty()) throw new NoSuchElementException("...");
Node x = floor(root, key);
if (x == null) return null; // if empty, return null
else return x.key;
}
// A helper method to get the floor of a key starting from Node curr
private Node floor(Node current, Key key) {
if (current == null) return null;

int cmp = key.compareTo(current.key);
if (cmp == 0) return current;
if (cmp < 0) return floor(current.left, key);

Node t = floor(current.right, key);
if (t != null) return t;
else return current;
}

Finding the ceiling is similar, just interchange right and left.

public Key ceiling(Key key) {
if (isEmpty()) throw new NoSuchElementException("...");
Node x = ceiling(root, key);
if (x == null) return null; // if empty, return null
else return x.key;
}
// A helper method to get the ceiling of a key starting from Node cu
private Node ceiling(Node current, Key key) {
if (current == null) return null;

int cmp = key.compareTo(current.key);
if (cmp == 0) return current;
if (cmp > 0) return ceiling(current.right, key);

Node t = ceiling(current.left, key);
if (t != null) return t;
else return current;
}

— Rank and Selection

For rank, suppose that we want the rank of key k …

If current node is greater than k, keep going to left; It means the key we are searching for is before(less than) the current key.

If current node is smaller than k, go right; It means the key we are searching for is after(greater than) the current key, and so, we need to count the current node, and all of the nodes on it’s left subtree.

Remember, rank of key k the number of keys that are less than k. In other words, it returns the index of k in a sorted array, or an ordered tree.
// get the number of keys in the subtree less than given key.
public int rank(Key key) {
return rank(root, key);
}
// A helper method to get the rank of a key starting from Node curre
private int rank(Node current, Key key) {
if (current == null) return 0;

int cmp = key.compareTo(current.key);
if (cmp < 0) return rank(current.left, key);
else if (cmp > 0)
return 1 + size(current.left) + rank(current.right, key);
else return size(current.left);
}

For select, suppose that we want the key at rank k(at index k in a BST) 

If the number of keys t in the left subtree is larger than k, we go left, and if t is smaller than k, we go right, and update k = k-t-1(subtract the current node, and all of the nodes on it’s left subtree).

Selection in a BST — algs4.cs.princeton.edu
// return the key at a given rank k; return kth smallest key
public Key select(int k) {
if (k < 0 || k >= size(root)) { // same as when the key is null
throw new IllegalArgumentException("...");
}
Node x = select(root, k);
return x.key;
}

// A helper method to return key of rank k starting from Node curren
private Node select(Node current, int k) {
if (current == null) return null;

int t = size(current.left);
if (t > k) return select(current.left, k);
else if (t < k) return select(current.right, k-t-1);
else return current;
}

— Range Queries

Finding how many keys fall within a given range [lo, hi] can be done by subtracting rank of lower key lo from rank of higher one hi (adding 1 if the higher key exists in the tree).

public int size(Key lo, Key hi) {
if (lo.compareTo(hi) > 0) return 0;
if (contains(hi)) return rank(hi) - rank(lo) + 1;
else return rank(hi) - rank(lo);
}

While finding which keys fall in a given range can be done by traversing the tree in order; left subtree, current node, then the right subtree recursively.

public Iterable<Key> keys(Key lo, Key hi) {
// create an Iterable object
Queue<Key> queue = new Queue<Key>();

// fill the Iterable object(queue) with keys in range[lo, hi]
keys(root, queue, lo, hi);

// return the Iterable object so we can iterate over the keys

return queue;
}
// A helper method to fill the Iterable object(queue) with the keys
private void keys(Node current, Queue<Key> queue, Key lo, Key hi) {
if(current == null) return;

int cmplo = lo.compareTo(current.key);
int cmphi = hi.compareTo(current.key);
    // if lo is less than current node, go left
if (cmplo < 0) keys(current.left, queue, lo, hi);
// if current node within the range[lo, hi], add to the queue
if (cmplo <= 0 && cmphi >= 0) queue.enqueue(current.key);
// if hi is greater than current node, go right
if (cmphi > 0) keys(current.right, queue, lo, hi);
}
// if no given range is specified, then iterate over all the keys
public Iterable<Key> keys() { return keys(min(), max()); }

Analysis

Symbol Table Implementations Analysis — algs4.cs.princeton.edu
All the operations for the Binary Search Tree takes h(except keys() method), where h is the height(depth) of the tree. The height of the tree is proportional to (LogN), if the keys are inserted in random order.
The keys() method always take O(N) in-spite of the height.

Deletion in BSTs

We skipped the delete method in our implementation (see above). This is because the delete operation needs a careful look.

The lazy approach

To remove a node with a given key, set it’s value to null, and leave the key. This node becomes useless, it’s just there to guide the search through the tree using it’s key.

Deletion in BSTs (The lazy approach) — algs4.cs.princeton.edu

This approach will cost ~ (2 * LogN) per insert, search and delete operation if keys are inserted in random order (where N is the number of nodes including deleted nodes in the BST), but it’s inconvenient because you’ll have large number of useless nodes (memory overload).

Delete Min/Max Key

For delete the minimum, we go left until finding a node that that has a null left link and then replace the link to that node by its right link.

For delete the maximum, it works similar, just interchange right and left.

Deleting the minimum key in a BST — algs4.cs.princeton.edu
The deleted node is then becomes available for the garbage collector.
// delete the smallest node
public void deleteMin() {
if (isEmpty()) return;
root = deleteMin(root);
}
// A helper method to delete the min node starting from Node current
private Node deleteMin(Node current) {
// go to left until finding a node where left is null
// once, we found it, return it's right node
if (current.left == null) return current.right;

// assign the left link of current node
current.left = deleteMin(current.left);

// update the size of each node as you go up
current.count = size(current.left) + size(current.right) + 1;
return current;
}

Delete An Arbitrary Node (Hibbard Method)

We can proceed in a similar manner to delete any node that has one child (or no children), but what can we do to delete a node that has two children? The solution is by replacing the node with its successor.

This replacement preserves order in the tree because there are no keys between the node’s key we want to delete and it’s successor’s key.

The successor node of node x is the leftmost node (the node with the smallest key) in the right subtree of x.
Deletion in a BST — algs4.cs.princeton.edu
// delete a node given it's key
public void delete(Key key) {
root = delete(root, key);
}
// A helper method to delete a node starting from Node current
private Node delete(Node current, Key key) {
if (current == null) return null;

// search for the node you want to delete
int cmp = key.compareTo(current.key);
if (cmp < 0) current.left = delete(current.left, key);
else if (cmp > 0) current.right = delete(current.right, key);
else { // once you found it
        // If no right child, return the left child
if (current.right == null) return current.left;
// If no left(but right child), return the right child.
if (current.left == null) return current.right;

// Otherwise; current node has two children
// 1. Save a link to the node to be deleted in t
Node t = current;
// 2. Set current to point to its successor min(t.right).
current = min(t.right);
        // 3. Set the right link of current to deleteMin(t.right);
// the link to the tree containing all the keys that are
// larger than current.key after the deletion.

current.right = deleteMin(t.right);

// 4. Set the left link of current(which was null) to t.left
// (all the keys that are less than both the deleted key and
// its successor).
current.left = t.left;
}
    // update the size of each node as you go up
current.count = size(current.left) + size(current.right) + 1;
return current;
}
The expected height that results from a sequence of random insertion and Hibbard deletions results in un-balanced tree. The height of the tree becomes sqrt(N) instead of (LogN).

Summary: Symbol Table Implementations

Symbol Table Implementations Analysis — algs4.cs.princeton.edu
In BST, If the keys are in random order, It ensures a probabilistic guarantee for search and insert operations of (2 LogN) in average case, and (4.311 LogN) in the worst case.
In BST, The delete operations takes O(N) in the worst case, and O(sqrt(N)) in average. If delete operations is allowed, then average case for search and insert is also O(sqrt(N)).

— Now, the question is, Can we guarantee (LogN) performance for all the operations in the worst case? Yes, Red-black BSTs can do that.