The HashMap data structure was invented around 1953 as an out-of-the-box approach to storing data more efficiently. The HashMap data structure makes use of a hash function to ensure optimum efficiency for the basic add, remove, and get operations.
The goal of a HashMap is to store key-value pairs. The key and value are generic types in the HashMap implementation, so any key and value class can be passed into the instantiation. For instance, one could instantiate HashMap<K, V> as HashMap<String, Integer> or HashMap<Vehicle, String>.
The Hash Function
The idea of the hash function is to generate a unique id for each key-value pair such that the location of the entry in the backing storage mechanism is easily identifiable. The hash function can be implemented in infinitely many ways, but the best hash functions generate unique hash codes for each object. For instance, a potential hash function could return the ascii code of the first letter of the key (assuming the key is a string). So, “apples” would return 65 and “banana” would return 66.
Think of a HashMap data structure as an Amazon Locker Hub. A lot of HashMaps are backed with a simple array as the storage mechanism. So think of the giant yellow structure as an array. Inside each locker lies data. This data is the value from each key-value pair. This data is associated with a key (from the key-value pair). Each of these lockers can be unlocked with this key. And finally, the hash code represents the locker number, identifying the location of each key-value pair.
Hopefully, now it makes more sense why it’s a good idea to have a unique hash code. Conflicting hash codes would result in multiple items in the same locker and a very confusing experience for an Amazon customer.
However, no matter how sophisticated we make our hash function, there may still be hashing collisions (when multiple key-value pairs have the same hash code). There are multiple strategies that can be employed here.
First, we can use external chaining. Basically, at the location of each hash code, we create a linked list and append each conflicting entry onto this list.
Alternatively, we could employ some probing strategies (linear, quadratic, logarithmic). Essentially, here, if there are conflicting hash codes, we just iterate through the backing array and drop the conflicting entry into the first empty location found.
If we iterate through the array one by one, this is called linear probing. If we iterate in a quadratic fashion (i.e. next spot, four spots from here, nine spots from here), this is quadratic probing.
To access a particular entry using the key, we can generate the hash code and look for the entry in the array based on the hash code. This is a simple O(1) operation. But since there are potential collisions like discussed above, we need to verify that the entry at that location has the right key using Java’s .equals() method. If they are equal, then we have found the entry. Otherwise, continue searching in the array depending on the collision strategy implemented.
To delete an entry, we do not completely erase all the data from the particular location. Instead, we put a <DELETED> marker at that location. This way, the HashMap is aware that the particular spot is technically empty although an entry with the respective hash code once existed there in case of a resize.
When creating a HashMap, we set a constant load factor. A load factor is a decimal value from 0 to 1 and this tells us the capacity level of the backing array, indicating when we should resize to maintain an efficient add operation. Say the load factor is 0.67. This means that when the HashMap is 2/3 full, then we need to perform a resize.
When resizing, we create a new storage mechanism (array most of the time) twice the size. The extent to which we increase the storage mechanism can vary but increasing it by twice is standard. We then recalculate the hash of each of the entries and place them into the new array.
Since calculating the hash code is an O(1) operation and finding the entry existing at that location in an array is also an O(1) operation, every single operation (add, delete, access, remove) is O(1) amortized. The worst-case scenario involves resizing, and this would be an O(n) operation assuming we use linear probing or could even be O(n²) if we have a bad hash function (i.e. all entries return the same hash code).