STL Unordered Map | HashTable

Saurabh Bhoy
4 min readAug 22, 2020

--

Have you guys used a hash table in a competitive coding challenge for improving the time complexity of your code?

source- https://memegenerator.net/instance/45139121/typical-bender-i-say-use-a-hashtable-and-90-of-the-time-im-right

basically, we use the hash table when insertion and access both need to be fast(o(1)), uniqueness is important and the order of data does not matter.

some common example is counting frequencies of words

Have you ever wondered how this Hash table internally works?

But before we get to its internal implementation, some context.

What is the hash function?

The hash function is like a mathematical formula that takes input and produces an output of a specific length. We can generate hashes of almost any digital content. Generally, We use the hashing technique for checking message integrity.

I have generated md5 checksum of CPP file source code using the md5 terminal command

md5 list.cpp

MD5 (list.cpp) = c24a0d6da90260a3fe56f9aac0893292

What is Hash Table?

Hash Table is like an unordered data structure that stores data as key-value pairs .it internally maintains the array of the bucket and each bucket is mapped to hash code, it like an index to the hash table. The hash code is calculated using the hash function which we have discussed previously.

in simple term, hash function compute index which decides in which bucket this entry can be found

C++ STL provides unordered_map which internally uses a hash table to store key and value pair

syntax- unordered_map<key,value>

Internal Working

Suppose We are designing a service which accepts the country name as input and returns country code as output

We can pass a custom hash function to the unordered map. if the hash function is not passed it uses by default hash function.

It has created 11 buckets and then stored the value in the bucket.

By using find function, we can check whether the key is present or not, if preset it returns the value of the mapped key otherwise the end of the iterator

Average case Time complexity in big o notation — insert(o(1)),delete(o(1)),search(o(1)) basically it looks good on paper but in reality it is depend on hash function and number of buckets

the hash function should evenly distribute keys to the buckets and should be easily computable

otherwise, collision will occur (collision means when a hash function maps two different keys to the same bucket)

in the worst case, suppose the hash function maps each key to the same bucket. In that case, insert, delete and search would take o(n) time.

in our previous example collision is occurred at the 2nd bucket. USA, Germany, and India map to the 2nd bucket

What is the load factor

load factor is the probability of collision in the hash table

=number of keys /number of buckets

when the load factor reaches the max load factor then hash table reconstruction may happen(rehashing).

in unordered map default load factor is 1

rehashing is a very expensive operation that’s why we have to carefully decide the number of bucket and design hash function carefully

in our example, the load factor is 0.5454. we can also manually check the hash function by passing key the hasher function.

Conclusion

We have learned Hash table performance majorly depends on the hash function and number of buckets. if the hash function and the number of buckets are not correctly chosen then we can experience worst-case time complexity and lose all benefits of the Hash table.

Write to me on the comment section if we you guys want to learn about some optimization technique related to the unordered map(Hash table)

until then…

--

--