Data Science Fundamentals: Introduction to Hash Functions

A gentle introduction to hash functions from scratch, should you go with the easy or the hard way?

Riade
The Startup
4 min readNov 27, 2020

--

Python data science hash functions
Photo by Chris Ried on Unsplash

The Hard Way (Using array as the back-end bone):

A hash function is any function that can be used to map data of arbitrary size to data of fixed size. One use is a data structure called a hash table, widely used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file.

In English: a hash function take an input, process it and then return a hashed output, this hashed output will be used as a reference to that value if you want to access it, this process will help you to retrieve data faster according to Big O notation, this hash function will be used to build what is none as hash tables.

Hash Tables is a data structure type that make accessing elements faster, for example if you’re using a standard array and you want to get what is stored at N index, the code will run throw all of your list until it gets that value, if this value is at the start this will take O(1) but what if your value is the last one, in this case Big O notation will be O(n) and this where hash tables are more efficient and faster, no matter how long your list is if you want to get specific value all you need is to run it throw hash function and it will return a reference or index on where this was stored in the hash table and immediately get the value without running all the way throw your list and the Big O is O(1).

python data science hash table hash function
Hash Function And Hash Table

You can code this function in any language (C++, java, JavaScript, python …) but for this example I will use python.

First thing to do is define a hash function that will handle all the hashing operations:

hash_table = [[] _ for i in range(30)]
def hash_function(value):
return value % len(hash_table)

This function will return hashed key for every value we provide, next is a function that can index our value with our hashed key:

def insert_to_table(hash_table, key, value):
hashed_key = hash_function(key) # This will return our hashed key
hash_table[hashed_key].append(value)

That is it, this will do for this example, all you need now is to display your hash table:

def display_hash_table(hash_table):
for x in range(len(hash_table)):
print(x)
for y in hash_table[x]:
print("-")
print(y)
print()

This function will do the work, now the entire code will be as follows:

hash_table = [[] _ for i in range(30)]
def hash_function(value):
return value % len(hash_table)
def insert_to_table(hash_table, key, value):
hashed_key = hash_function(key)
hash_table[hashed_key].append(value)
def display_hash_table(hash_table):
for x in range(len(hash_table)):
print(x)
for y in hash_table[x]:
print("-")
print(y)
print()

and you're done, if you want to create hash table all you need is these lines:

insert_to_table(hash_table, 1, 'Some_Value')
display_hash_table (hash_table)

But there is one problem, what if 2 value have the same hashed key? Which one will you keep and which one will you delete? Or you must keep both? This problem is called Collision and it is solved by implementing another data structure called Linked Lists (I will talk about it in details in other articles).

The Easy Way (using dictionary):

If you want to save time and go the easy way, you can just do this:

my_hash_table = {1:"Value1", 2:"Value2", 3:"Value3"}

This may look weird but this is actually a hash table, yes a dictionary can be used as a hash table, let’s say I want the value at the index 2: my_hash_table[2] it looks like a hash table, is it? And it does not have to iterate throw every single value to get your desired value right? No matter what this will take O(1).

Conclusion:

This was a gentle introduction to hash tables, no matter how you choose to do it I always suggest going with the easy choice, maybe if you are using a low level programming language that does not support dictionaries well in that case you may want to create it from scratch but if you're using Python, JavaScript or high level programming language a dictionary will do the trick.

--

--

Riade
The Startup

Hi!, my name is Riade, and I am python programmer, full stack web developer and intermediate data scientist.