# Hash Tables in Data Structure and Algorithm

Oct 16, 2020 · 8 min read

Hashing is the most important Data Struct that we use every day which is designed to use a special formula to encode the given value with a particular key for faster access of elements. the efficiency of hashing depends on the formula we used. Let's go into detail.

# INTRODUCTION

Suppose we want to design a system for storing users and their passwords. And we want to access to be performed efficiently:

Think about it. We can use the following data structures to maintain it: array with pair element, Linked List, Binary Tree, Trie, …

summary of some methods:

Can we do better?

The above data structures all of these operations can be guaranteed to be in O(Logn) time. So can we perform it with O(1) time? this is why the hash table comes in.

# BUILD HASH FUNCTION

The simplest method to build Hash function is each key, we can perform sum of each key by add all character and then we can use Modulo for M. M is typically a prime number and it is the size of Hash array. I just suppose in a simple case of password but in real life, we must encode password (this is not the purpose of this article and apply a ton of algorithm for encoding password).

`table_size = 7table = [0]*table_sizedef hashing(user):    sum = 0    for char in user:        sum += char    return sum%table_size# Example of USER1user = "USER1"password = "PASSWORD1"table[hashing(user)] = password`

We have many the formula for the Hash function like Robert Sedgewick, Polynomial, Cyclic shift,…

Robert Sedgewick Hashing:

`def RSHash(key):    a = 63689    b = 378551    hash = 0    for i in range(len(key)):        hash = hash * a + ord(key[i])        a = a * breturn hash%table_size`

Polynomial Hashing:

`def polynominalRollingHash(key):     p = 31     m = 1e9+99     power_of_p = 1     hash_val = 0     for s in key:        hash_val = (hash_val + (s - 'a' + 1)*power_of_p) % table_size        power_of_p = (power_of_p * p) % table_size     return hash_val`

Cyclic shift Hashing:

`def cyclicShiftHash(key):     hash_val = 0     for s in key:         hash_val ^= ((hash_val << 5) + s + (hash_val >> 27))     return hash_val & 0x7FFFFFFF;`

`0x7FFFFFFF` is a number in hexadecimal (2,147,483,647 in decimal) that represents the maximum positive value for a 32-bit signed binary integer.

So, what makes a good hash function?

The above hashing functions may be good at different problems, we have to try many problems to aware of it.

# COLLISIONS

Collision = two keys hashing to the same value.

=> can't avoid collisions unless you have a ridiculous amount of money :)) sorry for a little bit of confusion here, I mean the memory

# COLLISION REVOLUTION

How to handle Collisions?

There are mainly two methods to handle collision:

## Separate Chaining

The concept of separate chaining involves a technique in which each index key is built with a linked list. This means that the table’s cells have linked lists governed by the same hash function, So, in place of the collision error which occurred in the figure we used in the last section, the cell now contains a linked list containing the string “Bruce Lee” and “Robert Lee” as seen in this new figure. We can see in this figure how the subsequent strings are loaded using the separate chaining technique.

The below code for Separate Chaining:

`TABLE_SIZE = 1000class Node:    def __init__(self, key=""):        self.key = ""        self.next = Node    class HashTalbe:    def __init__(self):        self.head = Node        self.count = 0# In[]: create a new hashTablehashTable = []# In[]: hashing functiondef hashing(s):    hashValue = 1315423911    a=5    b=2    for i in range(len(s)):        x = hashValue << a        y = hashValue >> b        hashValue ^= (x+ord(s[i]) + y) & 0xFFFFFFFF    return hashValue & 0xFFFFFFFF# In[]: insert a new key into hash tabledef insert(s):    index = hashing(s) % TABLE_SIZE    newNode = Node(s)    if hashTable[index].head == None:        hashTable[index].head = newNode        hashTable[index].count = 1        return    newNode.next = hashTable[index].head    hashTable[index].head = newNode    hashTable[index].count += 1# In[]: search a key in hash tabledef search(s):    index = hashing(s) % TABLE_SIZE    currentNode = hashTable[index].head    while currentNode != None:        if currentNode.key == s:            return True        currentNode = currentNode.next    return False# In[]: delete a key in tabledef deleteKey(s):    index = hashing(s) % TABLE_SIZE    currentNode = hashTable[index].head    prevNode = currentNode    while currentNode != None:        if currentNode.key == s:            if currentNode == hashTable[index].head:                hashTable[index].head = hashTable[index].head.next            else:                prevNode.next = currentNode.next            hashTable[index].count -= 1            del currentNode            return True        prevNode = currentNode        currentNode = currentNode.next`

As the name of it, the concept is instead of create a new linked list to store the key is duplicated, we will move the duplicated key to an empty index. There are 4 mainly way to avoid collision.

Linear probing: when the key is duplicated, this key is moved to the next empty index with linear search.

Implementation:

`TABLE_SIZE = 100current_size = 0class Node:    def __init__(self, key = ""):        self.key = key        self.flag = 0hashTable = [Node()]*TABLE_SIZEdef JSHash(keys):    hashValue = 1315423911    a = 5    b = 2    for s in keys:        x = hashValue << a        y = hashValue >> b        hashValue ^= (x+ord(s)+y)    return hashValue & 0x7FFFFFFFdef insertKey(key):    global current_size    index = JSHash(key) % TABLE_SIZE    if current_size > TABLE_SIZE:        return False        while hashTable[index].flag:        index = (index+1)%TABLE_SIZE            hashTable[index].key = key    hashTable[index].flag = 1    current_size += 1    return Truedef searchKey(key):    index = JSHash(key) % TABLE_SIZE    count = 0    if current_size == 0:        return -1            while hashTable[index].flag != 0 and count <= TABLE_SIZE:        if hashTable[index].key == key:            if hashTable[index].flag == 1:                return index            return -1        index = (index+1) % TABLE_SIZE        count += 1        return -1`

Binary probing: use operation XOR to find the empty index.

Implementation:

`TABLE_SIZE = 100current_size = 0class Node:    def __init__(self, key = ""):        self.key = key        self.flag = 0hashTable = [Node()]*TABLE_SIZEdef JSHash(keys):    hashValue = 1315423911    a = 5    b = 2    for s in keys:        x = hashValue << a        y = hashValue >> b        hashValue ^= (x+ord(s)+y)    return hashValue & 0x7FFFFFFFdef insertKey(key):    global current_size    index = JSHash(key) % TABLE_SIZE    if current_size > TABLE_SIZE:        return False        for i in range(1, TABLE_SIZE+1):        if hashTable[index].flag == 0:            hashTable[index].key = key            hashTable[index].flag = 1            current_size += 1            return True        index = (index ^ i) % TABLE_SIZE    return Falsedef searchKey(key):    index = JSHash(key) % TABLE_SIZE    count = 0    if current_size == 0:        return -1    i = 1    while hashTable[index].flag != 0 and count <= TABLE_SIZE:        if hashTable[index].key == key:            if hashTable[index].flag == 1:                return index            return -1        index = (index ^ 1) % TABLE_SIZE        count += 1        i += 1        return -1`

Quadratic probing: Like Linear probing, the quadratic probing uses the quadratic function to change the index of the duplicated element. Find i-th index smallest such as (index+i*i) mod SIZE is empty then move it to there.

Implementation:

`TABLE_SIZE = 100current_size = 0class Node:    def __init__(self, key = ""):        self.key = key        self.flag = 0hashTable = [Node()]*TABLE_SIZEdef JSHash(keys):    hashValue = 1315423911    a = 5    b = 2    for s in keys:        x = hashValue << a        y = hashValue >> b        hashValue ^= (x+ord(s)+y)    return hashValue & 0x7FFFFFFFdef insertKey(key):    global current_size    index = JSHash(key) % TABLE_SIZE    if current_size >= TABLE_SIZE:        return False        for i in range(1, TABLE_SIZE+1):        if hashTable[index].flag == 0:            hashTable[index].key = key            hashTable[index].flag = 1            current_size += 1            return True        index = (index + i*i) % TABLE_SIZE    return Falsedef searchKey(key):    index = JSHash(key) % TABLE_SIZE    count = 0    if current_size == 0:        return -1    i = 1    while hashTable[index].flag != 0 and count <= TABLE_SIZE:        if hashTable[index].key == key:            if hashTable[index].flag == 1:                return index            return -1        index = (index + i*i) % TABLE_SIZE        count += 1        i += 1        return -1`

Double hashing: When the occurs collision we will use another hashing function. Suppose index_first is the index of the first hashing function, index_second is the index of the second hashing function. and then i is updated by index_first = (index_first + i*index_second) mod SIZE.

Implementation:

`def insertKey(key):    global current_size    index = JSHash(key) % TABLE_SIZE    inde_second = RSHash(key) % TABLE_SIZE    if current_size >= TABLE_SIZE:        return False        for i in range(1, TABLE_SIZE+1):        if hashTable[index].flag == 0:            hashTable[index].key = key            hashTable[index].flag = 1            current_size += 1            return True        index = (index + i*index_second) % TABLE_SIZE    return Falsedef searchKey(key):    index = JSHash(key) % TABLE_SIZE    index_second = RSHash(key) % TABLE_SIZE    count = 0    if current_size == 0:        return -1    i = 1    while hashTable[index].flag != 0 and count <= TABLE_SIZE:        if hashTable[index].key == key:            if hashTable[index].flag == 1:                return index            return -1        index = (index + i*index_second) % TABLE_SIZE        count += 1        i += 1        return -1`

# PRACTICE PROBLEMS

Try some problems are harder:

Written by

Written by