Find Duplicates Using a HashSet in Java

Kendall Perry
Strategio
Published in
4 min readDec 23, 2022

A HashSet is a collection of unique elements. It is an excellent tool for storing and accessing information when solving problems. Every value in a HashSet must be unique and does not allow for duplicate or null values. Also, one thing to remember is that HashSets don’t necessarily store values in sorted order. Because of their efficiency and ease of setting up, HashSets are one of my favorite data structures to use, and they can come in handy for various problems!

Set example using colorful shapes. A shape that already exists cannot enter the set, another shape that doesn’t exist yet is able to enter the set of unique shapes!

To relate with how we might use a Set in our day-to-day, let’s say you are creating a playlist of songs. You add distinct and unique songs by title to this playlist. You can search for songs by title, add, delete, and clear the playlist.

We’ll specifically look at applying a HashSet to an array-based problem, where we’ll need to find and return all duplicate integers where each integer appears once or twice. Here’s the problem and definition from LeetCode.

Find All Duplicates in an Array

Given an integer array nums of length n where all the integers of nums are in the range [1, n] and each integer appears once or twice, return an array of all the integers that appears twice.

You must write an algorithm that runs in O(n) time and uses only constant extra space.

We could solve this problem brute force with a nested for-loop, keeping track of all duplicates as we go. However, implementing a HashSet is much more efficient and will help us get to that O(n) runtime.

To get started, we’ll want to specify the data type we want to store. For this problem, we’ll use the Integer type:

HashSet<Integer> store = new HashSet<Integer>();

We also want to have a way of keeping track of the duplicate integers, so we can set up an ArrayList with the Integer type that we will eventually return:

ArrayList<Integer> output = new ArrayList<Integer>();

Next, we will iterate through the nums array. If the HashSet store doesn’t contain our current integer using store.contains(current), we will add it to the store store.add(current). Else, if store already contains the current integer, we know it is a duplicate and can immediately add it tooutput. This allows us to do two steps at once in only one iteration through nums!

A few handy methods built into HashSet include:

  • .add() — adds an item to the set.
  • .contains() — checks if an item is in the set.
  • .remove() — removes an item from the set.
  • .clear() — clears all items from the set.
for (int i = 0; i < nums.length; i++) {
int current = nums[i];
if (store.contains(current)) {
output.add(current);
} else {
store.add(current);
}
}

At this point, we’ve worked our way through the entire nums array. We’ve checked each value against the HashSet store. Any value that appears more than once immediately was added to the output. All we have left to do is return our output!

Let’s break this down with an example.

If we are given nums = [1, 7, 2, 5, 7, 3, 3] as our input, our HashSet storewould store the unique values of nums and would look like this: store = {1, 7, 2, 5, 3}. Our ArrayList output would store the duplicate values and would look like this: output = [7, 3].

Let’s see how it all comes together in the code below:

class FindDuplicates {
public List<Integer> findDuplicates(int[] nums) {

HashSet<Integer> store = new HashSet<Integer>();
ArrayList<Integer> output = new ArrayList<Integer>();

for (int i = 0; i < nums.length; i++) {
int current = nums[i];
if (store.contains(current)) {
output.add(current);
} else {
store.add(current);
}
}
return output;
}
}

I hope this helps better understand one way a HashSet can be implemented to solve a problem!

--

--