What do we mean by the Hash Value of some data? It is a value that uniquely identifies a large data. When we are searching for a value in an array, searching traditionally will take linear time. This can be reduced to logarithmic time if the data is sorted. But most data is not sorted, and sorting it will take more than linear time. There is a ton of data being produced every second. Searching through it linearly could take several days. What can we do to do this search in constant time? We can use Hashing’s idea and save the data’s hash values as key and use the same key to look for the value stored.
A hash function converts a large value of data into a smaller practical value called the hash value. This hash function is said to be good if it computes the key-value efficiently and distributes the key values evenly in a range.
A simple example of a hash function is a Mod Function. Suppose we can store N number of unique entries in a table. We know that remainders range from 0 to N-1 on dividing by N. So we can take N as the divisor and get the mod value as the hash value of data. Generally, we take a prime number as the value of N.
HASH(data) = data % N
This is a simple hash function giving hash values ranging from 0 to 9. There is one issue here. This function will give same values for 17 and 67, as
HASH(17) = 17 % 10 = 7
HASH(67) = 67 % 10 = 7
If we have our data as 17 and 67, they will refer to our table’s same location. This is called Collision. Collision is the problem of two keys resulting in the same result. This issue can be addressed in multiple ways. Here I will discuss two popular methods.
1. Separate Chaining
As chaining suggests, we will make a chain of keys having the same values. The hashed value represents a location in the table; the keys corresponding to the same locations are stored in a linked list. The keys are appended one after another. Each location of the table points to the head of the linked list.
- It can be implemented easily with the help of a linked list.
- We don’t need to worry about the size of the input.
- Keys are stored in a linked list. As the list grows, it becomes difficult to search for a value.
- If the input is not distributed evenly, many locations of the table will have no entries, resulting in wastage of space.
- The pointers in linked lists are an overhead.
2. Open Addressing
In Open addressing, all the entries are stored in the table. In the case of collision, the next free slot is searched, and the value is inserted.
Insertion in Open Addressing
Probing: The process of looking for the next empty slot using a numerical method and hashed value is called probing. There can be multiple types of probing; here, we will discuss only linear and quadratic.
- Linear Probing — To find the next slot, the current slot is probed(increased) by a constant amount. Usually, it’s 1.
- For example, if Hash(key) is full, then, next, we look for Hash(key)+1, then Hash(key)+2, Hash(key)+3, and so on.
- Quadratic Probing — Unlike linear probing, the next slot is probed by the square of numbers starting from 1.
- For example, if Hash(key) is full, then, next, we look for Hash(key)+1², then Hash(key)+2², Hash(key)+3², and so on.
Here is an example of open addressing with linear probing.
After adding 14, when we find the Hash for 41, it comes out to be 41%8 = 1. Cell 1 is not free. According to linear probing, the next cell will be, ((Hash)+1)%8 -> (1+1)%8 = 2, which is also occupied. Similarly, the next cell 3 is occupied too. So we probe again and find cell 4 free. We insert 41 here.
After this, the next three elements get added without probing.
Points to Notice
- To keep these probes inside the range of the table size, the mod is taken by the table size, i.e., the next slot will be (Hash(key)+1)%(size of the table).
- We need quadratic probing because linear probing may cause congestion in one part, leading to longer search time.
Search in Open Addressing
To find an element X, keep probing until the value present in a slot becomes equal to X, or an empty slot is found.
Deletion in Open Addressing
If we just delete the key Y, this slot will be marked as empty. While looking for some element X, if we come to this slot, finding it empty, we will consider X as absent in the table. Instead, we have probed more. To avoid this, we should mark deleted.
Say we want to delete 45, we calculate its hash and see if the value at that location matches 45. If it does, we delete it; otherwise, we probe to the next location. After deletion, we mark the cell as deleted.
- The table will be evenly filled with the proper probing method.
- No overhead of pointers.
- Input size can not be greater than table size.
- A little complex to implement.
This hashing concept is used in Maps in programming languages. This is the process that goes on behind the scenes. Hope you enjoyed learning about this.