Understanding HashMaps in DSA

Week 1: Harnessing HashMaps for Success in Data Structures and Algorithms

Nikhil Bajpai
10 min readNov 5, 2023
Understanding hashMap part 1

Hi !! Folks, I’ve recently started my interview preparation through the land of DSA and this week I learned about hashMap in data structure. So in this article, we’ll talk about a bit of introduction to hashing, What is a hash function, and what a is collision. But the more important part is how to implement a hashmap to solve the problem more optimally.

To make this journey engaging and practical, we’ll embark on solving a few challenges from platforms like LeetCode and GeeksForGeeks. Our plan of action is simple yet effective: we’ll start by unraveling the brute force solutions, typically boasting time complexities around O(n²) or O(n³). Then, with the prowess of hashing, we’ll guide you on a quest to refine these solutions, achieving remarkable time complexities of O(n) or even O(1).

Introduction to Hash Table :

A hash table, often referred to as a hash map, is a fundamental data structure that facilitates efficient data storage and retrieval. It works by using a hash function to map keys to specific locations within an array, where associated values are stored. It’s particularly useful when dealing with large datasets because it ensures that data can be accessed in near constant time, O(1), making it one of the fastest data retrieval methods.

Example: Imagine you’re building a contact management application, and you want to store contact information. You could use a hash table, where the keys might be the names of your contacts, and the values are the corresponding phone numbers and email addresses. The hash function would transform each contact’s name into an index within the array, allowing for near-instantaneous O(1) retrieval of their contact information. This efficient data structure is a key asset in managing and organizing data, ensuring fast and reliable access to information.

Okay to summarize we want to have a data structure that would store a key and the value associated with that particular key. Array sounds useful since we can use indices as Keys and the element present at the index would be the value that is associated with that key. For an array accessing and updating any particular key is an O(1) operation.

updating arr["key"] is O(1)
Accessing arr["key"] is O(1)

However, for large data we are talking out 10⁹ range we can’t simply instantiate an array of size 10⁹ and access its key’s associated value directly it will give us a TLE in online judge. Now that’s a problem.

In cases where the keys are large and cannot be used directly as an index, you should use hashing.

In hashing, large keys are converted into small keys by using hash functions. The values are then stored in a data structure called a hash table. The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. Each element is assigned a key (converted key). By using that key you can access the element in O(1) time.

We’ll talk about Hash Functions and collisions in the next article.

Just a side note but the major difference b/w a HashSet and a hashMap is when we want to store only keys we use a HashSet but when we require a key-value pair then we use a hashMap

Now let’s talk about how to implement a hashmap to solve the problem more optimally.

Enough with the overview let’s answer some questions.

Question 1 :

https://practice.geeksforgeeks.org/problems/largest-subarray-with-0-sum/1

Given an array having both positive and negative integers. The task is to compute the length of the largest subarray with a sum = 0.

Example 1:

Input:
N = 8
A[] = {15,-2,2,-8,1,7,10,23}
Output: 5
Explanation: The largest subarray with
sum 0 will be -2 2 -8 1 7.

Thoughts?

Approach 1: Always go with the Brute force or Naive solution.

So the brute force solution would be to go through all possible subArray and then find out their sum and check whether it’s != or = 0.

Try it yourself.

Approach 2: A wise man once said if there is a subarray and sum in a single sentence always think of the prefix sum at least once.

So going with the prefix sum thought process for the above array :

   Arr : [15,-2,2,-8,1,7,10,23]
prefixOfA : [15,
15+(-2),
15+(-2)+2,
15+(-2)+2+(-8),
15+(-2)+2+(-8)+1
15+(-2)+2+(-8)+1+7,
15+(-2)+2+(-8)+1+7+10,
15+(-2)+2+(-8)+1+7+10+23
]
o/p -> [15, 13, 15, 7, 8, 15, 25, 48]

Now how to find the prefix sum between two points (i,j) :
prefixSum(i,j) = prefixSum(j)-prefixSum(i-1)

And it’s given in the question that we need to find the subarray with
Sum = 0
. Meaning prefixSum(i,j) = prefixSum(j)-prefixSum(i-1)= 0

Which can be interpreted asprefixSum(j) = prefixSum(i-1)

Which can again be interpreted as the subArray(i,j) doesn’t add anything to the prefix sum.This can further be interpreted as we need to look for duplicates in the prefix array since : prefixSum(j) = prefixSum(i-1).

Example: To prove the above Interpretations :

arr =[15,-2,2,-8,1,7,10,23]
prefixSum = [15, 13, 15, 7, 8, 15, 25, 48]
i| j|
we have two duplicate for 15 at i = 0 and j = 2 -> sum(arr[1:2]) = 0

So we can say that duplicates in your prefixsum Array will give you subArray with sum as 0 .

But the question is also asking the longest subArray with sum as 0 :
This could also go with our previous interpretation i.e. we need to find the farthest duplicates in the prefix sum subArray since we are talking about the longest distance between two duplicate numbers.

Now how to look for the farthest duplicates for any element :

for the largest subarray, the element will only care about the farthest duplicate only

Here’s what we are gonna do :

1. Initialize the necessary data structures and variables.
2. Loop through the elements of the input list `nums` to
calculate the prefix sum and store it in `prefixSumArray`.
3. Iterate through `prefixSumArray` and keep track of the first occurrence
of each prefix sum in the dictionary `d`. If a prefix sum is already in `d`,
calculate the subarray length with a sum of zero and
update `ans` if it's greater than the current value.
4. `ans` now contains the length of the maximum subarray with a sum of zero.
5. Output `ans`.

Before looking into the code give yourself a couple of minutes to think why it should work.

# Given list of numbers
nums = [15, -2, 2, -8, 1, 7, 10, 23]

# Initialize the prefix sum array and a dictionary to store first occurrences of prefix sums
prefixSumArray = [nums[0]]
d = {0: -1} # The initial prefix sum is 0 with an index of -1

# Initialize a variable to store the maximum subarray length with a sum of zero
ans = 0

# Calculate the prefix sum for each element in the input list
for i in range(1, len(nums)):
prefixSumArray.append(prefixSumArray[-1] + nums[i])

# Iterate through the prefix sum array
for i in range(len(prefixSumArray)):
# Check if the current prefix sum is already in the dictionary
if prefixSumArray[i] not in d:
# If not, add it to the dictionary with the current index as the value
d[prefixSumArray[i]] = i
else:
# If it's already in the dictionary, calculate the subarray length
# with a sum of zero and update 'ans' if it's greater than the current value
ans = max(ans, i - d[prefixSumArray[i]])

# 'ans' now contains the length of the maximum subarray with a sum of zero
print(ans)

Dry Run for the above code with the below code

Dry run of using the above code

Time Complexity is O(n) and space complexity is O(n) to create the hashmap

Question 2 :

https://leetcode.com/problems/longest-consecutive-sequence/

Given an unsorted array of integers, return the length of the longest consecutive element sequence.

You must write an algorithm that runs in O(n) time.

Example 1:

Input: nums = [100,4,200,1,3,2]
Output: 4
Explanation: The longest consecutive elements sequence is [1, 2, 3, 4].
Therefore its length is 4.

Input: nums = [0, -3, 5, -1, 7, -2, -4, 1, 3]
Output: 6
Explanation: There are two longest consecutive sequences of
length 6: [-4, -3, -2, -1, 0, 1] and [-2, -1, 0, 1, 2, 3]
we return 6 as an output.

Thoughts?

Approach 1: When in doubt start with brute force.

The longest consecutive sequence must start from some element in the array. So the basic idea would be to explore each possibility. Pick each element in the input and do a linear search to count the length of the longest consecutive sequence starting from that element.
We also keep track of the longest length of consecutive sequences seen so far during this process.

nums = [0, -3, 5, -1, 7, -2, -4, 1, 3]

what i mean is start from the first element of the nums array and check if it's
i+1(next) element is there or not if it is stop and travese again to find i+2,
element this should go for the whole array

Time complexity for this would be : n* n^2
=(for each element)*(going to each element and trying to form a subsequence)
=n^3

❌-> the element is not present in the array

Dry run :
0->1->2❌ length => 2
-3-> -2-> -1-> 0-> 1-> 2 ❌ length => 5
5-> 4 ❌ length = 1
-1-> 0-> 1-> 2 ❌ length 2
7-> 8❌ length 1
-2-> -1-> 0-> 1-> 2❌ length 4
-4->-3->-2->-1->0->1->2 ❌ length => 6
1->2❌ length = 1
4->5-> 6❌ length = 2
Hence longest consecutive element sequence is 6

Try to write brute force code yourself

Approach 2 :

Let’s build on the brute force approach only which operation is taking more time. That will be searching whether (i+1)th (i+2)…… element exists of the taken “i” that will be in order or O(n²) we can reduce it to O(n) by simply using a HashSet as look up to check whether the next element exists or not in O(1)and for each element that will be O(n).

What I meant is we can drop the searching for the sequence time from O(n²) to O(n) by using HashSet.

Now our updated time complexity is O(n²) i.e. O(n) for checking each element * O(n) for searching sequence.

Can’t we check whether the element in question let's say “i” is a starting point or not? How this would help is we can simply ignore all those elements that are present in the HashSet since they’ll already be part of the sequence in the future. ❌ ✔️

A :[0, -3, 5, -1, 7, -2, -4, 1, 3]
HashSet A : [0, -3, 5, -1, 7, -2, -4, 1, 3]
❌: No the element does not existin the hashSet hence find out the sequence length
✔️: yes the element exist in the hashSet hence ignoring this element
here we are checkingfor every i if i-1 exist in the hashSet or not .
If it exist we can ignore that element if not then find the sequence length
✔️ -1<-0
-4<--3
❌ 4<-5->6 {Do not exist } lenght = 1
✔️ -2<--1
6<-7->8 {Do not exist } length = 1
✔️-3<--2->-1->0->1->2 ❌ length = 5
❌-5<--4->-3->-2->-1->0->1->2❌ length = 6

And we'll repeat this process till we reach the end .

Time complexity: We’ll only touch any element twice once while creating the sequence and other when we check whether it’s a start or not.

Hence our time complexity will : 2*O(N) + N — > O(N)

Okay so our plan of action is :

  1. Create a Set: Convert the input list nums into a set called hashSet. This set helps eliminate duplicate elements and makes it easy to check if a number is present.
  2. Initialize Variables: Set a variable ans to 0. This variable will keep track of the maximum length of a consecutive subsequence.
  3. Iterate Through Numbers:
  • Go through each number in the nums list one by one.
  • For each number, check if the number immediately before it (i.e., nums[i] - 1) is not in the hashSet. If it's not there, it means that the current number can be the start of a consecutive subsequence.

4. Find the Consecutive Subsequence:

  • If the previous number is not in the set, initialize a variable chainLength to 1. This variable will help count the length of the current consecutive subsequence.
  • Start a loop to find all consecutive elements in the subsequence:
  • Increase the current number (x) by 1.
  • Check if x is present in the hashSet. If it is, it means there's a consecutive number in the subsequence.
  • If found, increment chainLength by 1.

5. Update Maximum Length:

  • After finding all consecutive elements, compare chainLength with the current maximum ans.
  • If chainLength is greater, update ans with its value. This keeps track of the longest consecutive subsequence found so far.

Final Result:

  • After processing all the numbers in the list, the ans variable contains the length of the longest consecutive subsequence.

Output Result:

  • Print the value of ans to obtain the length of the longest consecutive subsequence.

In essence, this algorithm uses a set to efficiently check for the presence of numbers and iterates through the list to find consecutive subsequences, updating the maximum length found as it goes.

Think about it for a couple of minutes and do a dry run :

Python code :

nums = [0, -3, 5, -1, 7, -2, -4, 1, 3]

# Step 1: Create a set to efficiently check for number presence
hashSet = set(nums)

# Print the set for reference
print(hashSet)

# Step 2: Initialize a variable to store the maximum consecutive subsequence length
ans = 0

# Step 3: Iterate through the list of numbers
for i in range(len(nums)):
# Step 4: Check if the previous number is not in the set
if nums[i] - 1 not in hashSet:
# Step 5: If the previous number is not present, initialize chainLength to 1
chainLength = 1
x = nums[i] + 1

# Step 6: Find consecutive elements in the subsequence
while x in hashSet:
chainLength += 1
x += 1

# Step 7: Update the maximum length found so far
ans = max(ans, chainLength)

# Step 8: Print the maximum consecutive subsequence length
print(ans)

Here we are using a set for efficient membership checks.

That’s it for today’s sessions this was my understanding of hashMap and hashSet wrt to data structure tomorrow we’ll talk about hash function and collision and 3 interview questions regarding hashing.

For any suggestions or improvements drop me a ping on LinkedIn :

★THANK YOU★

--

--