Understanding HashMap in Java

Published in

CodeX

5 min readJul 16, 2024

A HashMap is a data structure that maps keys to values. It is widely used due to its average-case time complexity of O(1) for both insertions and lookups. This article will delve into the inner workings of HashMap, their benefits, common use cases, and potential pitfalls.

Basic Structure

A HashMap in Java is essentially an array of buckets. Each bucket is a linked list or a tree structure, depending on the number of elements and the specific implementation (from Java 8 onwards). The elements in each bucket are called nodes, and each node stores a key-value pair.

public class HashMap<K, V> {
    static class Node<K, V> {
        final K key;
        V value;
        final int hash;
        Node<K, V> next;

        // Constructor and other methods...
    }

    Node<K, V>[] table; // Array of nodes
    int size; // Number of key-value pairs in the map
    // Other fields...
}

 
Map<String, Integer> map = new HashMap<String, Integer>();

The class has 4 fields : hash, key, value and next.

key — key is the element using which we can search in a hash map.
value — value associated to the specific key.
hash — hash code the the particular key.
next — points to the next node in the key-value pair.

Working of a HashMap

The specialty of hash map lies in its ability to search, insert and delete elements from it in an average of O(1) time complexity.

Most widely used functions in a hash map are get(search) and put(insert). When we insert a pair in hash map, the put function sends the key to a hash function which converts the key into an integer hash code. Using this hash code, we calculate the index of the bucket(linked list) in the array(or hash table) for that key value pair.

int index = hashcode % array_length

This index points to a bucket in an array. Each bucket can hold multiple key value pairs(which is called collision because ideally one bucket should have one pair).

If a node already exists on that index, it has to iterate through the LinkedList to find whether the given key already exists in it. If it exists, it simply replaces the value in that node. If it doesn’t exist, we create a node with the key, value, hash and next.

Since, it has to iterate a LinkedList, the time complexity increase in this case to O(n), where n is the number of nodes in that particular bucket.

How is this optimized?

Load factor is the threshold (0.75 in java) above which hash table should decide to resize or rehash to accommodate more entries. When the number of entries exceeds the product of the load factor and the current capacity, the hash table is resized and rehashed.

Best way to solve/optimize this is to improve the hash function, to increase the size of the array(or hash table) and using an alternative to LinkedList(0(N) time complexity) which java does rightfully so.

Improving hash function

In Java’s HashMap, bit manipulation is used to distribute the hash codes more uniformly across the hash table. This helps in reducing collisions and improving overall performance.

Java’s HashMap uses a secondary hash function to better distribute the initial hash code:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

h = key.hashCode(): Computes the initial hash code of the key.
h >>> 16: Performs an unsigned right shift of 16 bits on the hash code.
h ^ (h >>> 16): Uses the XOR operation to mix the higher bits into the lower bits, reducing the chance of collisions.

2. Resizing Hash Table(Dynamic Resizing)

When the number of entries in the HashMap exceeds the product of the load factor and the current capacity, the hash table is resized. During resizing, the capacity is doubled, and all existing entries are rehashed into the new table.

3. Conversion of Linked List to Red-black tree

Another great optimization done by Java 8+ is conversion of linked list to red-black trees. If the number of entries in a single bucket exceeds threshold (default 8 in java), the linked list is converted to red black tree(or balanced binary tree) reducing the worst case time complexity from O(N) to O(log(N)) where N is the number of key value pairs. This conversion from a linked list to a red black tree happens during rehashing and is done for any buckets with a size exceeding the threshold.

Common use cases

HashMap are used for many purposes, some of which are mentioned below.

Caching: Store frequently accessed data to improve performance.
Counting Frequencies: Track occurrences of elements in a collection.
Database Indexing: Map keys to database records for quick lookup.
Configuration Settings: Store configuration key-value pairs.
Session Management: Track user sessions in web applications.
Routing Tables: Implement routing tables in networking.
Dictionary/Glossary: Store and retrieve definitions or translations.

Potential Pitfalls

Everything can’t be right with anything right? But thankfully we do have a solution for all the problems or we are on a verge to find them. Similarly for hash maps, we have a few cons, but there are already existing solutions for those problems which are listed below.

Thread Safety:

Problem: HashMap is not thread-safe, leading to potential data corruption in concurrent environments.
Solution: Use Collections.synchronizedMap() for simple synchronization or Concurrent HashMap for higher concurrency performance.

High Collision:

Problem: Poor hash function can lead to many collisions, degrading performance from O(1) to O(n).
Solution: Ensure a good distribution of hash codes by properly implementing the hashcode() method for keys.

Memory Usage:

Problem: Inefficient memory use with high load factors or large initial capacities can waste memory.
Solution: Choose an appropriate initial capacity and load factor based on expected usage. The default load factor of 0.75 is usually a good trade-off between time and space.

Null Keys/Values:

Problem: Allows null keys/values, which can cause Null pointer exception during certain operations if not handled properly.
Solution: Avoid using null keys and values if possible, or ensure null handling logic is properly implemented.

Iteration Order:

Problem: HashMap does not guarantee the order of elements, which can be problematic if order is important.
Solution: Use LinkedHashMap if a predictable iteration order (insertion order) is required.

Performance Overhead:

Problem: Treeification introduces complexity for heavily-collided buckets, which can slightly increase the overhead for these buckets.
Solution: Accept the small overhead for the benefit of worst-case performance improvement, or consider using an alternative data structure if treeification becomes a frequent issue.

Resize Operations:

Problem: Resizing (rehashing) the HashMap can be costly, especially if it occurs frequently.
Solution: Set an initial capacity that accommodates your needs to minimize resizing operations. Monitor and adjust load factors if necessary.

Memory Leaks:

Problem: Holding references to unused keys and values can cause memory leaks.
Solution: Remove entries promptly and consider using WeakHashMap for cases where keys should be garbage-collected when no longer in use.

I hope you found this article helpful. If you have any doubts or need any help, hit me up on my twitter, I will be more than happy to help.

Buy me a coffee to support my work , I will be motivated to write more such articles!

https://www.buymeacoffee.com/VedantJdv