HashMaps Explained

HashMap Example with the keys (names) and the entries (7-digit code) (Source: )

Introduction

The HashMap data structure was invented around 1953 as an out-of-the-box approach to storing data more efficiently. The HashMap data structure makes use of a hash function to ensure optimum efficiency for the basic add, remove, and get operations.

The Goal

The goal of a HashMap is to store key-value pairs. The key and value are generic types in the HashMap implementation, so any key and value class can be passed into the instantiation. For instance, one could instantiate HashMap<K, V> as HashMap<String, Integer> or HashMap<Vehicle, String>.

The Hash Function

The idea of the hash function is to generate a unique id for each key-value pair such that the location of the entry in the backing storage mechanism is easily identifiable. The hash function can be implemented in infinitely many ways, but the best hash functions generate unique hash codes for each object. For instance, a potential hash function could return the ascii code of the first letter of the key (assuming the key is a string). So, “apples” would return 65 and “banana” would return 66.

(Source: )

Think of a HashMap data structure as an Amazon Locker Hub. A lot of HashMaps are backed with a simple array as the storage mechanism. So think of the giant yellow structure as an array. Inside each locker lies data. This data is the value from each key-value pair. This data is associated with a key (from the key-value pair). Each of these lockers can be unlocked with this key. And finally, the hash code represents the locker number, identifying the location of each key-value pair.

Hopefully, now it makes more sense why it’s a good idea to have a unique hash code. Conflicting hash codes would result in multiple items in the same locker and a very confusing experience for an Amazon customer.

Collision Strategies

However, no matter how sophisticated we make our hash function, there may still be hashing collisions (when multiple key-value pairs have the same hash code). There are multiple strategies that can be employed here.

First, we can use external chaining. Basically, at the location of each hash code, we create a linked list and append each conflicting entry onto this list.

An example of a HashMap using external chaining. The red items all have a hash code of 0. Thus, these items are all attached in a linked list at the array’s 0th index. Likewise for the yellow, green, gray, and orange values. (Source: )

Alternatively, we could employ some probing strategies (linear, quadratic, logarithmic). Essentially, here, if there are conflicting hash codes, we just iterate through the backing array and drop the conflicting entry into the first empty location found.

Since 18 already exists in the 5th index, we need to move 44 along the array and drop it off in the first empty spot that we find (Source: )

If we iterate through the array one by one, this is called linear probing. If we iterate in a quadratic fashion (i.e. next spot, four spots from here, nine spots from here), this is quadratic probing.

Accessing

To access a particular entry using the key, we can generate the hash code and look for the entry in the array based on the hash code. This is a simple O(1) operation. But since there are potential collisions like discussed above, we need to verify that the entry at that location has the right key using Java’s .equals() method. If they are equal, then we have found the entry. Otherwise, continue searching in the array depending on the collision strategy implemented.

Deleting

To delete an entry, we do not completely erase all the data from the particular location. Instead, we put a <DELETED> marker at that location. This way, the HashMap is aware that the particular spot is technically empty although an entry with the respective hash code once existed there in case of a resize.

Resize

When creating a HashMap, we set a constant load factor. A load factor is a decimal value from 0 to 1 and this tells us the capacity level of the backing array, indicating when we should resize to maintain an efficient add operation. Say the load factor is 0.67. This means that when the HashMap is 2/3 full, then we need to perform a resize.

When resizing, we create a new storage mechanism (array most of the time) twice the size. The extent to which we increase the storage mechanism can vary but increasing it by twice is standard. We then recalculate the hash of each of the entries and place them into the new array.

Time Complexity

Since calculating the hash code is an O(1) operation and finding the entry existing at that location in an array is also an O(1) operation, every single operation (add, delete, access, remove) is O(1) amortized. The worst-case scenario involves resizing, and this would be an O(n) operation assuming we use linear probing or could even be O(n²) if we have a bad hash function (i.e. all entries return the same hash code).

--

--

--

CS @ Georgia Tech '24

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

REST API

Why most low-code platforms fall short on mobile development

The flat file in webMethods with Demo.

Reduce Cost and Increase Productivity with Value Added IT Services from buzinessware — {link} -

Scalar Uni-variate Function Minimizers

Roadmap of the CRODO project

Network Interworking completed for Klaytn

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aadit Trivedi

Aadit Trivedi

CS @ Georgia Tech '24

More from Medium

Algorithms

The Big O

Dynamic Programming-Parent Problem 1 (0/1 Knapsack): 3 approaches with practice questions.

knapsack problem image

TrieSort in linear time*