# Binary Search Trees

## Sounds familiar? but we’re going to take a closer look.

A *binary search tree* (BST) is a binary tree in a symmetric order, where each node has a key (and an associated value).

A binary tree means it consists of nodes, and each node has at most two children(left and right child).

While being in a symmetric order is that every node is larger than all the nodes in the left subtree, and smaller than the keys in the right subtree.

It’s used to implement the **symbol tables (associative arrays)**, and it’s the key that would be used to sort the nodes accordingly.

# Binary Search Trees

We are going to implement the symbol tables using binary search tree. So, The main operations of a binary search tree would be the same; *get, put, delete*, *contains* *& isEmpty.*

## Implementation

We’ll start by the binary search tree (BST) class. It has a reference variable called “root” points to the root node.

The Key and Value type is any generic type. And, for keys, they are *Comparable***. **So, we can use *compareTo()* method, to compare whether one key is less than the other, or greater than, or equal.

`public class BST<Key extends Comparable<Key>, Value> {`

private Node root = null; *// root of BST*

}

Then, we have an inner class called “Node” to represent each node within the tree. Each node has a key, a value, a reference to the left and right child node, and an integer “count” the number of nodes in the subtree rooted at the node.

`private class Node {`

private Key key; *// sorted by key*

private Value val; *// associated data*

private Node left, right; *// left and right subtrees*

private int count; *// number of nodes in subtree*

public Node(Key key, Value val, int count) {

this.key = key;

this.val = val;

this.count = count;

}

}

## — get

It searches for a node using it’s key recursively, and returns it’s value.

It starts from the root node, if it’s less than, then go left, if it’s greater than, go right, and if it’s equal, we are done; search hit. On the other hand, If the tree is empty, or we reached null link, we have a search miss.

// searches for a node with a given key, and returns associated val.public Value get(Key key) {

return get(root, key);

}// A helper method to search for a node given a starting Node

private Value get(Node current, Key key) {

if (current == null) return null;// search miss

int cmp = key.compareTo(current.key);

if (cmp < 0) return get(current.left, key);// go left

else if (cmp > 0) return get(current.right, key);// go right

else return current.val;// search hit

}

From now on, during our implementation for symbol table using binary search tree, It might be the case when key is null, at this point, you may throw an exception.

## — put

Insert a node of a key, and a value in the tree. It searches the tree recursively until it finds a null link, and all that we need to do is replace that link with a new node.

If the key already exists, it updates the value (avoid duplicates). It removes the node from the tree if the value is null.

public void put(Key key, Value val) {

if (val == null) {// remove key from table if value is null

delete(key);

return;

}

root = put(root, key, val);

}// A helper method to insert a node given a starting Node

private Node put(Node current, Key key, Value val) {

if (current == null) return new Node(key, val, 1);

// if key exists, overrides the old value (avoid duplicates)int cmp = key.compareTo(current.key);

if (cmp < 0) current.left = put(current.left, key, val);

else if (cmp > 0) current.right = put(current.right, key, val);

else current.val = val;

// update number of nodes rooted at Node current

current.count = 1 + size(current.left) + size(current.right);

return current;

}

## — delete

We’ll skip it for now, but, we’ll come back to it later (see below).

## — contains

*// Is the given key exists?*

public boolean contains(Key key) { return get(key) != null; }

## — Is the tree empty?

When the tree is empty, the root node will be null, and hence number of nodes rooted at the root node is 0; no nodes in the tree.

// Is the table empty?public boolean isEmpty() { return size(root) == 0; }// A helper method to get number of nodes rooted at Node current

private int size(Node current) {

if (current == null) return 0;

else return current.count;

}

**Analysis**

The cost is the number of compares, which equals to the depth of a node in the tree + 1. The best case for searching is O(1); the root maybe the answer.

The running times of *get *and *put *operations equals to the number of compares, which equals to to the depth of a node in the tree + 1.

The depth of the tree depends on the shapes of the trees, which, in turn, depends on the order in which keys are inserted.

- The
**worst**case when the keys are inserted in descending order; max tree depth is O(N). - The
**best**case when the tree is perfectly balanced; max tree depth is O(LogN).

If keys are inserted in **randomly order**, the number of compares in *search *or *insert *operation is ~ (2 LogN) on **average **case, while the **worst **case is about ~ (4.311 LogN) (proposed by Reed, 2003). There is a small chance for the height to be ~ (N) in the worst case if the keys are inserted in random order.

Unlike Binary Heaps, where all levels are full (almost except the last one).

Balancedbinary tree means for every (non leaf) node, the left and right subtrees’ heights differ by at most one, whileperfectly balancedmeans all leaf nodes have the same depth or same level; every path from the root to the null link has the same length.

# Ordered Operations

An important reason for why binary search trees are widely used is that they allow us to keep the keys in order.

So, using the binary search tree implementation for symbol tables, where keys are *Comparable*, we can support extra operations like: *min*, *max*, *floor*, *ceiling*, *rank*, *select*, …etc.

## — Min and Max

We keep moving to the left from the root to find the minimum, and to the right for the maximum.

// get the smallest keypublic Key min() {

if (isEmpty()) throw new NoSuchElementException("...");

return min(root).key;

}// A helper method to get the smallest node starting from Node curreprivate Node min(Node current) {

if (current.left == null) return current;

else return min(current.left);

}// get the largest keypublic Key max() {

if (isEmpty()) throw new NoSuchElementException("...");

return max(root).key;

}// A helper method to get the largest node starting from Node currenprivate Node max(Node current) {

if (current.right == null) return current;

else return max(current.right);

}

## — Floor and Ceiling

For finding the *floor*, If current node is greater than key* k*, keep going to left. If current node is smaller than *k*, go right. Whenever we go right, the current node is the floor, until we go right again.

public Key floor(Key key) {

if (isEmpty()) throw new NoSuchElementException("...");

Node x = floor(root, key);

if (x == null) return null;// if empty, return null

else return x.key;

}// A helper method to get the floor of a key starting from Node curr

private Node floor(Node current, Key key) {

if (current == null) return null;

int cmp = key.compareTo(current.key);

if (cmp == 0) return current;

if (cmp < 0) return floor(current.left, key);

Node t = floor(current.right, key);

if (t != null) return t;

else return current;}

Finding the *ceiling* is similar, just interchange right and left.

public Key ceiling(Key key) {

if (isEmpty()) throw new NoSuchElementException("...");

Node x = ceiling(root, key);

if (x == null) return null;// if empty, return null

else return x.key;

}// A helper method to get theceilingof a key starting from Node cu

private Node ceiling(Node current, Key key) {

if (current == null) return null;

int cmp = key.compareTo(current.key);

if (cmp == 0) return current;

if (cmp > 0) return ceiling(current.right, key);

Node t = ceiling(current.left, key);

if (t != null) return t;

else return current;

}

## — Rank and Selection

For *rank*, suppose that we want the rank of key* k …*

If current node is greater than *k*, keep going to left; It means the key we are searching for is before(less than) the current key.

If current node is smaller than *k*, go right; It means the key we are searching for is after(greater than) the current key, and so, we need to count the current node, and all of the nodes on it’s left subtree.

Remember, rank of key k the number of keys that are less than k. In other words, it returns the index of k in a sorted array, or an ordered tree.

// get the number of keys in the subtree less than given key.

public int rank(Key key) {

return rank(root, key);

}// A helper method to get the rank of a key starting from Node curreprivate int rank(Node current, Key key) {

if (current == null) return 0;

int cmp = key.compareTo(current.key);

if (cmp < 0) return rank(current.left, key);

else if (cmp > 0)

return 1 + size(current.left) + rank(current.right, key);

else return size(current.left);

}

For *select*, suppose that we want the key at rank *k*(at index *k* in a BST) *…*

If the number of keys *t* in the left subtree is larger than *k*, we go left, and if *t* is smaller than *k*, we go right, and update *k = k-t-1*(subtract the current node, and all of the nodes on it’s left subtree).

*// return the key at a given rank k; return kth smallest key*

public Key select(int k) {

if (k < 0 || k >= size(root)) { *// same as when the key is null*

throw new IllegalArgumentException("...");

}

Node x = select(root, k);

return x.key;

}

*// A helper method to return key of rank k starting from Node curren*

private Node select(Node current, int k) {

if (current == null) return null;

int t = size(current.left);

if (t > k) return select(current.left, k);

else if (t < k) return select(current.right, k-t-1);

else return current;

}

## — Range Queries

Finding how many keys fall within a given range *[lo, hi]* can be done by subtracting rank of lower key *lo* from rank of higher one *hi* (adding 1 if the higher key exists in the tree).

`public int size(Key lo, Key hi) {`

if (lo.compareTo(hi) > 0) return 0;

if (contains(hi)) return rank(hi) - rank(lo) + 1;

else return rank(hi) - rank(lo);

}

While finding which keys fall in a given range can be done by traversing the tree in order; left subtree, current node, then the right subtree recursively.

public Iterable<Key> keys(Key lo, Key hi) {// create an Iterable object

Queue<Key> queue = new Queue<Key>();

// fill the Iterable object(queue) with keys in range[lo, hi]keys(root, queue, lo, hi);

// return the Iterable object so we can iterate over the keys

return queue;

}// A helper method to fill the Iterable object(queue) with the keysprivate void keys(Node current, Queue<Key> queue, Key lo, Key hi) {

if(current == null) return;

int cmplo = lo.compareTo(current.key);

int cmphi = hi.compareTo(current.key);// if lo is less than current node, go leftif (cmplo < 0) keys(current.left, queue, lo, hi);if (cmplo <= 0 && cmphi >= 0) queue.enqueue(current.key);

// if current node within the range[lo, hi], add to the queue// if hi is greater than current node, go rightif (cmphi > 0) keys(current.right, queue, lo, hi);}// if no given range is specified, then iterate over all the keyspublic Iterable<Key> keys() { return keys(min(), max()); }

## Analysis

All the operations for the Binary Search Tree takes h(except keys() method), where h is the height(depth) of the tree. The height of the tree is proportional to (LogN), if the keys are inserted in random order.

The keys() method always take O(N) in-spite of the height.

# Deletion in BSTs

We skipped the *delete *method in our implementation (see above). This is because the delete operation needs a careful look.

## The lazy approach

To remove a node with a given key, set it’s value to null, and leave the key. This node becomes useless, it’s just there to guide the search through the tree using it’s key.

This approach will cost ~ (2 * LogN) per *insert*, *search *and *delete *operation if keys are inserted in random order (where N is the number of nodes including deleted nodes in the BST), but it’s inconvenient because you’ll have large number of useless nodes (memory overload).

## Delete Min/Max Key

For delete the minimum, we go left until finding a node that that has a null left link and then replace the link to that node by its right link.

For delete the maximum, it works similar, just interchange right and left.

The deleted node is then becomes available for the garbage collector.

// delete the smallest node

public void deleteMin() {

if (isEmpty()) return;

root = deleteMin(root);

}// A helper method to delete the min node starting from Node currentprivate Node deleteMin(Node current) {

// go to left until finding a node where left is null

// once, we found it, return it's right node

if (current.left == null) return current.right;

// assign the left link of current node

current.left = deleteMin(current.left);

// update the size of each node as you go up

current.count = size(current.left) + size(current.right) + 1;

return current;

}

## Delete An Arbitrary Node (Hibbard Method)

We can proceed in a similar manner to delete any node that has one child (or no children), but what can we do to delete a node that has two children? The solution is by replacing the node with its *successor.*

This replacement preserves order in the tree because there are no keys between the node’s key we want to delete and it’s successor’s key.

The successor node of node x is the leftmost node (the node with the smallest key) in the right subtree of x.

// delete a node given it's keypublic void delete(Key key) {

root = delete(root, key);

}// A helper method to delete a node starting from Node currentprivate Node delete(Node current, Key key) {

if (current == null) return null;

// search for the node you want to deleteint cmp = key.compareTo(current.key);

if (cmp < 0) current.left = delete(current.left, key);

else if (cmp > 0) current.right = delete(current.right, key);

else {// once you found it// If no right child, return the left child

if (current.right == null) return current.left;// If no left(but right child), return the right child.if (current.left == null) return current.right;

// Otherwise; current node has two children// 1. Save a link to the node to be deleted in tNode t = current;// 2. Set current to point to its successor min(t.right).current = min(t.right);// 3. Set the right link of current to deleteMin(t.right);

// the link to the tree containing all the keys that are

// larger than current.key after the deletion.

current.right = deleteMin(t.right);

// 4. Set the left link of current(which was null) to t.leftcurrent.left = t.left;

// (all the keys that are less than both the deleted key and

// its successor).

}// update the size of each node as you go up

current.count = size(current.left) + size(current.right) + 1;

return current;

}

The expected height that results from a sequence of random insertion and Hibbard deletions results in un-balanced tree. The height of the tree becomes sqrt(N) instead of (LogN).

# Summary: Symbol Table Implementations

In BST, If the keys are in random order, It ensures a probabilistic guarantee for search and insert operations of (2 LogN) in average case, and (4.311 LogN) in the worst case.

In BST, The delete operations takes O(N) in the worst case, and O(sqrt(N)) in average. If delete operations is allowed, then average case for search and insert is also O(sqrt(N)).

— Now, the question is, *Can we guarantee (LogN) performance for all the operations in the worst case?* Yes, Red-black BSTs can do that.