Hashsets vs Binary Search Trees: Speed

Elliot Hurdiss
Quick Code
Published in
4 min readJan 26, 2018

I had been doing a bit of reading on binary search trees and how they were designed for quick searching and It got me wondering. How would the lookup from a simple hashset fare in comparison to one of these trees.

Let’s define this question a little better:

Given a data structure<int> of n length — how long does it take to find a known value in this structure and to deduce that a value does not exist in this structure.

Consider an array of length 5:

[1,2,3,4,5] — How long would it take to find the value 4 in that array and to find out that 6 isn’t in the array.

A good old face-off was the only way to find out*

*Aside from google, books, the internet, colleagues and anyone else who knows how to program.

The contenders:

· The humble array as a control group.

· Hashset<int> — As we are just using integers in this experiment we will use a HashSet (a dictionary where the key == Hashed(value))

· Binary Search tree — Not a built-in data type in .Net relatively easy to create and is known for fast searching

The search tree:

A BST is not a data structure we use in every day programming so here is a quick overview on how I created mine.

A BST is a collection of Nodes (see below). Each node has a value and no more than two other nodes — left and right. Every child node that is smaller than the parent node is stored on the left and every child node that is larger than the parent node is stored on the right.

A BST

The advantages of using a BST is that you don’t have to iterate over an entire data structure to find the value or do some sort of sorting algorithm.

A Node on the tree

Creating the tree is relatively simple with a recursive function:

Which is called like so:

Setting up the tree

It is worth mentioning here that I know how inefficient this code is. We do end up enumerating the whole array anyway and the recursion will also be caning the memory, but remember, we’re here to test the speed of the lookup so we’ll gloss over this for now.

To search for a value on the BST all we do is:

Searching algorithm

So to do the experiment I created a sample data set:

The value in the middle of the array and the value at the end of the array are values we will be searching for along with one variable that doesn’t exist (1000001).

To make this a fair trial I used a Log10(n) scale starting at 200k ints in the array.

The Results:

So I had to record the results in ticks because, it doesn’t matter how hard you try, doing this operation on an 8 core — 4Ghz processor was never going to be slow.

The results have indicated… a draw. How underwhelming.

Predictably the array search times scaled with the size of the data set in an O(n) fashion. However, both the Binary search tree algorithm and the Hashset.Contains() method seemed to take the same amount of time.

Initially I was surprised by the performance of the BST algorithm. But when you think about it, it is not so surprising after all.

The FindValue method on the Node will be called a maximum of Log2(n) with n being the initial sample size (as this is the maximum depth of the BST).

See the below table:

Increasing the number of items in the sample size by 10 fold only requires an extra 3–4 method calls which equates to virtually no time at all. Great result.

This means it is as efficient as the hashset lookup, which is infamously fast!

Before we declare victory for the Binary search tree, this implementation (knowingly far from perfect) held nearly 10 times as much memory as the hashset and required nearly 10 times as long to create. I suspected as much seeming though countless hours of research has gone into refining and perfecting the hashset and about 45 minutes went into creating my binary search tree.

Still:

Hashset 1–0 BST

--

--