How HashMap Works : A missing piece of hood.

Published in

Zero Equals False

8 min readJan 11, 2018

Hi there, you’ll find a lot of tutorials about HashMaps Implementation but a very few go into the details of the HashMaps. Here we’ll be going to understand the inner workings of a HashMap and understand how it can be improved based upon the user needs. So let’s dive. If you get stuck anywhere, keep reading. Because I’ve solved all the doubts within the article itself. If you still have doubts, please ask them in the comment section. This is my attempt for #100DaysOfCode

Before we begin, I would love for you to try my latest gaming application that implements all of the features that I am going to talking in this series. Any feedback is appreciated. Open this link in your mobile, if possible.
Try it: Gamezop

Let’s do one thing. Open a Cmd (Command Prompt for Windows or a Terminal for IOS) and type in a command:
`javap java.util.HashMap`
And press Enter.

Javap is a Java Class File Disassembler which shows a list of variables, their types, methods, constructors and what class implements or extends. It basically shows the structure of the Java File.
So when you entered the above command, you saw a list of methods name, right?
So forget about that for a minute.

Now what is a HashMap? We’ll come back to this again after a few minutes.

Wonder if we could mix the powers of an Array (the O(1) retrieval) and the linkedlist (the dynamic addition and deletion of nodes), how much powerful could we become?
This is just what a HashMap does.

Let’s suppose A hashmap is a storage place where we want to keep all of the data. So building upon this, we might want to add data, retrieve data or manipulate data, right? We’ll go through all of this by understanding the structure of the HashMap Java Class.

`public class java.util.HashMap<K, V> {}`

What does this mean? It says, that a HashMap (A storage place) will store the data in the form of a key-value pair where K is key and V is value. Why key-value, you’d ask? So this is because it makes the retrieval of data easier.

What a hashmap does is, it takes the data and store the data in a bucket and then label that bucket with the hash of the key. So when we want that data back, we just call a method and pass in that key. And the hashmap will again hash the key, looks up if it has a bucket labeled with the hash of the key.

I told you that the HashMap stores the data in key value pair and that it uses the power of both array and a linkedlist. That’s because it has an array and linked list implemented in it (Duh!).
The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.
One thing to note here is that, the HashMap stores the object in the form of Entry Class. And that Entry class is in the form of a linked list.
An Entry class looks like this.

static class Entry<K,V> implements Map.Entry<K,V>
{
  final K key;
  V value;
  Entry<K,V> next;
  final int hash;
  ........
}

So, let’s suppose that each key can correspond to multiple values (Collision — explanation given later), these multiple values will be then stored in the Entry Object one by one and each entry object will be a LinkedList Node.

Collisions in HashMap.

A collision is a case when we try to add different keys with the same hash. To avoid returning back the value and saying that the HashMap can’t add this new value because it already contains this hash of the key, what it does is, it creates a linked list. And keeps on adding the values with the same hash but different key by creating new node in the linked list present at the array position calculated with the hash of that key. Get the idea with this picture.

The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.

There are mainly 3 operations to perform:

Adding Data
Retrieving Data
Manipulating Data

Let’s go into detail of each one.

1. Adding Data

To add data in a HashMap, we use the put() method.

`public V put(K key, V value) {…}`

What it’s saying is, to add data in form of key value pair where K can be anything like int, String, char or even an Object. And same goes for a value. And this method returns the old value present for that key. If there’s no value present for that key, this method will return null.

Let’s go into the method put.

public V put(K key, V value) {
  if (key == null)
  return putForNullKey(value);
  int hash = hash(key.hashCode());
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next)
  {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
    {
      V oldValue = e.value;
        e.value = value;
        e.recordAccess(this);
        return oldValue;
    }
   }   return null;
}

Step 1: What it’s saying is, first check if it’s null or not. If it’s null, don’t take the hash of the key as it will produce an error and instead take the data and insert a linked list node at 0th index of the array and put the data and key as an Entry Object. Only one object can be stored at the 0th index.
If the key is not null, then take the hash of the key through the hash method overriden from the Object Class by the developer and then again take a hash through hash method implemented by the HashMap .

Step 2: Now this hash will be a long digit integer. The array size cannot be this huge. The default array size is 16. And it grows exponentially in the power of 2. So what we do is, we take the hash and the array size and through some algorithm, produce an index within the range of the size of the array.

Step 3: Then from the array it takes out each object (an Entry Object) one by one, and checks if the key matches and the hash matches. If it does, it adds the Entry object into linked list and return the old value.
If there is already a value present in the linked list for that key, it adds the new Entry object at the end of that node and it forms a chain like this.

How to prevent the dupicate Key entry? This is done by the equals() method. An equal method as used in the if condition in the above example. If the key is present it just adds the Entry object in the next node at that array location.

2. Retrieving Data

For retrieval, HashMap uses a get() Method.
`public V get(K)`
I am getting these method prototype from the javap command I used earlier.

This method is saying that it gives you the Value (Entry Object), if you add the key in the get method. Let’s dive deeper into the method structure. It’s almost same as the put() method.

public V get(Object key)
{
  if (key == null)
  return getForNullKey();
  int hash = hash(key.hashCode());
  for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next) 
   {
     Object k;
     if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
     return e.value;
   }
  return null;
}

Step 1: Check if the key is null or not. If the key is null, then return the value kept at the 0th index of the array.

Step 2: If the key is not null, take the key and use the developer overridden hashCode() method if it’s there. Then take the hash of the output again.

Step 3: In the for loop, what you see is, you take out the whole linked list on that array location by passing the index (calculated from the indexFor() method)

Step 4: Now we know that Each entry object looks like this.

static class Entry<K,V> implements Map.Entry<K,V>
{
 final K key;
 V value;
 Entry<K,V> next;
 final int hash;
 ........
}

So it means each node of the linked list looks like this. Each node stores the value, the hash, the key and the address of the next Entry Object or the node in the chain.

So when in the for loop, we get the first entry object of the linked List at that array location. Then we match it’s Key and Value and the hash. If it matches we return the value or else we pick up the next node in the linked list. and do the same process again.

Step 4: If we find the value, we return it or else we return null.

3. Manipulating Data

`public V remove(K);
public void clear();
public boolean remove(K,V);
public boolean replace(K, V, V);
public V replace(K, V);`

All of these methods are used for manipulating data. I guess you understand how this works now. Still if you don’t, ask in the comment section.

Performance of the HashMap

The performance of the hashmap is affected by the Initial Capacity and the Load Factor.

Initial capacity is the array size. It’s default value is 16. And it grows in the power of 2.

Load Factor is the value that determines at which point the array needs to be resized. The default value is 0.75. This means if the array has been filled upto INITIAL_CAPACITY*LoadFactor, the array is resized and then it becomes of 32 in length (16*2). And that keeps on repeating.

If you want to set them explicitly, yourself, you can do it like this.
Map<String,String> hashMapWithCapacity=new HashMap<>(32); Map<String,String> hashMapWithCapacityAndLF=new HashMap<>(32, 0.5f);

A high INITIAL_CAPACITY is good for large number of Entries but little iterations.
And a low INITIAL_CAPACITY is good for small number of entries but huge iterations.

New In Java 8

As of java 8, when the Entries in a linked list reaches 8 (MIN_TREEIFY_CAPACITY;), it converts the linked list to a Balanced Tree . This improved the performance a million times.
Earlier, the get method had the complexity of O(n) in worst case scenarios. But that changed in Java 8. It became O(logn). This was because of the Balanced Tree which made the traversal faster.

Now even with a million entries, it takes only 10 nanosecond!

So this is the power of the HashMaps

This was all under the hood knowledge of HashMaps. Thanks for reading. If you find any mistake, I apologize in advance. If you have any doubts, please comment. If you like it, please clap. :)