Re-Implementing Naive Hashing based Maps
The idea behind this article is to understand better certain data structure by reimplementing them
What is a Map
Map is a Data Structure typically used to store Key-Values association as it allows to very efficient random access: O(1) read, insert, delete
What makes Map so efficient
The idea that powers this Data Structure is the hash function which essentially maps the key into a memory location of the underlying storage, which is then assumed to offer random accessible
An example of hashing
Let’s first identify the function domain and co-domain so let’s assume
- key type is string and
- storage is an array so to access its location a valid index, hence an index in [0..Size), is needed
So such a function can be implemented for example by summing the ASCII values of each char in the string and then computing the module with the array size
What about an example of implementation
Sure here it is
What are the know issues of this implementation
This example shows a very simple Map which just detects, but does not manage, a typical issue affecting hash based systems: hash collision.
It happens because typically hash function has no guarantee to be injective which means that 2 different keys might be associated to the same memory slot.
In this example, it is easy to figure out how this can happen: if two words have the same sum of ASCII chars (e.g. like “ciao” and “biap”) then it results in a hash collision
Is it possible to fix it
Sure, otherwise we won’t have map in production :)
We’ll see how in the next episode