Java HashMap: Internal Implementation and Working

6 min readSep 3, 2023

The internal implementation of a Java HashMap involves concepts like hash codes, buckets, and collision resolution. Let’s break it down into simple steps and explanations, along with relevant examples.

Hashing:

When you put a key-value pair into a HashMap, the key’s hash code is calculated using the hashCode() method. This hash code is used to determine the index (position) where the value will be stored in an array known as the bucket.

2. Buckets:

A HashMap is an array of “buckets.” Each bucket can hold one or more key-value pairs. The goal is to distribute the key-value pairs across these buckets evenly to improve retrieval performance.

3. Index Calculation:

To convert a hash code into an index, the hashCode() is subjected to a process called "hashing" or "hash function." The hash code is often transformed to fit within the range of the bucket array.

4. Collisions:

Sometimes, different keys can produce the same hash code, leading to collisions. For example, if two keys have the same hash code but are different, they should be stored at the same index in the bucket array.

5. Collision Resolution:

HashMap employs various strategies to handle collisions. One common approach is to use linked lists or other data structures to store multiple key-value pairs in the same bucket. When a collision occurs, new entries are added to the existing list.

6. Load Factor and Rehashing:

As the number of elements in the HashMap grows, it’s possible that collisions increase, impacting performance. To prevent this, a threshold called the “load factor” is used. When the number of elements exceeds the load factor times the current capacity, the HashMap is rehashed (recreated with larger capacity) to reduce collisions.

Example:

Consider a simple example where we put key-value pairs into a HashMap. The internal array of buckets (initially empty) might look like this:

Suppose we add key-value pairs "Alice" -> 25, "Bob" -> 30, and "Charlie" -> 35. The hash codes and indices (using a simple hashing function) might be:

HashCode to index calculation:

Let’s assume we have an array of buckets with a size of 10 (arraySize = 10) for this explanation.

Step 1: Hash Code Calculation: The hash codes are: 831297, 664214, and 788423.

Step 2: Transformation: For the sake of this explanation, we won’t apply any further transformations to the hash codes.

Step 3: Fitting within Array Size (Modulo Arithmetic): The calculated hash code needs to be mapped to an index within the available range of indices, which is determined by the array size (10).

For hash code 831297: index = hash % arraySize = 831297 % 10 = 7
For hash code 664214: index = hash % arraySize = 664214 % 10 = 4
For hash code 788423: index = hash % arraySize = 788423 % 10 = 3

Resulting in the following indexes:

So, the updated HashMap might look like this:

This example demonstrates the basic concept of hashing, indexing and storing key-value pairs in buckets.

Let’s dive into the internal workings of the common operations performed on a HashMap in Java, step by step.

Adding a Key-Value Pair (put Operation):

When you add a key-value pair using the put(key, value) method, the following steps occur:

The hash code of the key is calculated using the hashCode() method.
The hash code is then transformed into an index within the range of the bucket array. This might involve masking or modulo operations to fit the index within the array size.
If there’s no value stored at the calculated index, the key-value pair is simply stored there.
If there’s already a value at the index (a collision), the HashMap uses collision resolution strategies. Commonly, it adds a new entry to the existing linked list in the bucket or may rehash and increase the size of the bucket array if necessary.

2. Retrieving a Value (get Operation):

When you retrieve a value using the get(key) method:

The hash code of the key is calculated.
The index is computed from the hash code using the same hashing function used during the put operation.
The HashMap looks at the calculated index and the linked list (if a collision occurred) to find the entry with the matching key.

3. Removing an Entry (remove Operation):

When you remove an entry using the remove(key) method:

The hash code of the key is calculated.
The index is computed from the hash code.
The HashMap searches for the entry in the bucket at the calculated index. If found, the entry is removed, and the linked list (if any) is updated.

4. Iterating through Entries (entrySet Operation):

When you iterate through the entries using a loop like for (Map.Entry<K, V> entry : map.entrySet()):

The HashMap iterates through all the buckets.
For each bucket, it iterates through the linked list of entries (if present) to provide access to each key-value pair.

5. Load Factor and Rehashing:

As the number of elements in the HashMap increases, collisions become more likely, impacting performance. The load factor is a threshold (usually around 0.75) that determines when the HashMap should be rehashed:

When the number of elements exceeds the load factor times the current capacity, rehashing occurs.
Rehashing involves creating a new, larger bucket array, recalculating the indices, and redistributing the entries.

Handling index collisions in a hashmap

In a HashMap, when the calculated index is already occupied (a collision occurs), the HashMap needs to handle this situation. One common approach is to use linked lists or other data structures to store multiple key-value pairs in the same bucket. This mechanism allows multiple entries with the same index to coexist.

Here’s an example to illustrate how collisions are handled in a HashMap:

import java.util.*;

public class CollisionHandlingExample {
    public static void main(String[] args) {
        // Create a HashMap with a small size to intentionally cause collisions
        HashMap<Integer, String> hashMap = new HashMap<>(5);

        // Adding key-value pairs
        hashMap.put(1, "Alice");
        hashMap.put(6, "Bob");   // Causes a collision with index 1

        // Retrieving values
        String alice = hashMap.get(1);
        String bob = hashMap.get(6);

        // Printing values
        System.out.println("Alice: " + alice);  // Prints "Alice"
        System.out.println("Bob: " + bob);      // Prints "Bob"
    }
}

In this example, we create a HashMap with a small size of 5 to intentionally cause collisions. We add two key-value pairs: 1 -> "Alice" and 6 -> "Bob". Since both keys result in the same index (1) due to the small array size, a collision occurs.

Here’s what happens internally when the collision occurs:

The hash code of key 1 is calculated, and the index is determined as 1.
The key-value pair "Alice" is stored at index 1.

When adding the key-value pair 6 -> "Bob":

The hash code of key 6 is calculated, and the index is also determined as 1.
Since index 1 is already occupied by the key-value pair "Alice", HashMap's collision resolution mechanism comes into play.
The new entry "Bob" is stored in a linked list alongside the existing entry "Alice" in the same bucket (index 1).

When retrieving values:

The get(1) call successfully retrieves "Alice" from index 1.
The get(6) call retrieves "Bob" from the linked list in the same bucket (index 1).

This demonstrates how collisions are managed by storing multiple entries with the same index using linked lists or similar data structures in a HashMap.

References:

Java: The complete reference

Programming with Java

Java OCA 8 Certification Guide

Java HashMap: Internal Implementation and Working

Written by Priya Salvi