Handling Collisions in a Hash Table

Jon SY Chan
3 min readJul 30, 2019

First off let’s talk about hash tables before we get into handling collisions.

A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index into an array in which an element will be inserted or searched. By using a good hash function, hashing can work well. Under reasonable assumptions, the average time required to search for an element in a hash table is O(1).

https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/

If you know the exact index of the item you want to target, than you can grab your information with a speed of O(1). With a normal array, you usually don’t know the specific index of what you want is. This leads you into iterating over the array one by one with a speed of O(n) (worst case) looking for a match to the data you want. THIS is why hash tables are so much quicker, you can go straight to the address as long as you know the address.

Now how does this key value pair work? Well based on whatever key you use a hashing function is applied which will map your specific key into an index.

https://commons.wikimedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svg

A hashing function should ideally be easy to compute, have uniform distribution, and avoid collisions.

What are collisions? If you look in the above chart John Smith and Sandra Dee are indeed colliding. How does this happen? Well in reality there are an unlimited possibilities of keys but a finite amount of space to put these keys in. Eventually there will be some collision where different keys will be hashed to result in the same location. In the image above 152 is a collision.

So how can we resolve this issue? There are many ways to do this and I will list a few below.

Chaining — chaining is the idea of formatting linked lists on all collisions. The issue with this is you get all the overhead that comes with a linked list. If you want to retrieve that specific information you’d have to traverse the linked list. Example below:

https://en.wikipedia.org/wiki/Hash_table#/media/File:Hash_table_5_0_1_1_1_1_1_LL.svg

Rehashing — this is where if your first hash leads you to a spot that has something already you simply re-hash it and hope to land in an open space. This rehash can be the same function or something different. As long as the order of hashing is followed, you will get to your desired entry.

Linear Probing — this is a very simple method where you just add 1 and go to the next position if the first hashed location is taken (collision). The issue with this is with high load factors it can lead to clustering which induces collisions into other collisions.

h(j)=h(k), so the next hash function,
h1 is used. A second collision occurs,
so h2 is used. — https://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html

Quadratic Probing — this is similar to linear probing except instead of adding 1 or going to the neighboring location you add a successive values of an arbitrary quadratic polynomial until an open slot is found. This leads to less primary clustering.

--

--