Java HashSet— What Beginners Need to Know

Alexander Obregon
9 min readJun 18, 2024

--

Image Source

Introduction

The Java HashSet is one of the most commonly used data structures in Java. They offer a simple yet powerful way to store unique elements and perform efficient lookups. In this article, we’ll explore what HashSets are, how to use them, and some common operations and use cases. This article is also a good refresher for those who are already familiar with HashSets.

What is a HashSet?

A HashSet is a part of the Java Collections Framework and implements the Set interface. It is backed by a hash table (actually a HashMap instance), and it doesn’t allow duplicate elements. The primary characteristics of a HashSet are:

  • No Duplicate Elements: HashSets automatically handle duplicates. If you try to add an element that already exists in the set, it will be ignored.
  • Unordered: The elements in a HashSet are not stored in any particular order. The order might change over time as elements are added and removed.
  • Efficient: HashSets provide constant time performance for basic operations like add, remove, contains, and size, assuming the hash function disperses elements properly among the buckets.

How Does a HashSet Work?

HashSets rely on hashing to store and retrieve elements. When you add an element to a HashSet, the following steps occur:

  1. Hash Code Calculation: The hashCode method of the object is called to compute a hash code. This hash code is an integer value that represents the object's memory address in a simplified form.
  2. Hash Code Modulo Operation: The hash code is then divided by the number of buckets (slots) in the hash table, and the remainder is used as the index for the bucket where the element will be stored. This makes sure that the hash code is mapped within the range of available buckets.
  3. Collision Handling: If two elements have the same hash code (a collision), they are stored in the same bucket but linked together in a list or tree structure within that bucket.
  4. Element Storage: The element is stored in the determined bucket, and a reference to the element is maintained.

When you perform operations like contains or remove, the hash code of the element is calculated, and the appropriate bucket is located using the same process. The element is then found or removed from the bucket.

Why Use a HashSet?

HashSets offer several advantages that make them suitable for various scenarios:

  1. Eliminating Duplicates: HashSets automatically make sure that each element is unique. This is useful when you need to store a collection of items where duplicates are not allowed, such as a list of unique usernames or product IDs.
  2. Efficient Lookups: HashSets provide constant time performance for basic operations like adding, removing, and checking for the presence of elements. This makes them ideal for use cases where fast lookups are critical, such as in caching mechanisms or membership testing.
  3. Memory Efficiency: Since HashSets are backed by hash tables, they can be more memory-efficient compared to other data structures like trees or linked lists, especially when the number of elements is large.
  4. Set Operations: HashSets can be used to perform mathematical set operations like union, intersection, and difference. These operations are useful in scenarios like filtering data, finding common elements between collections, and excluding specific items from a set.

Creating a HashSet

Creating a HashSet in Java is straightforward. Here’s a simple example:

import java.util.HashSet;

public class HashSetExample {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();

// Adding elements to the HashSet
set.add("Avocado");
set.add("Peach");
set.add("Cherry");

// Displaying the HashSet
System.out.println("HashSet: " + set);
}
}

In this example, we create a HashSet of strings and add three elements to it. The System.out.println statement prints the elements in the set. Note that the order of elements is not guaranteed.

Internal Working of HashSet

To delve deeper into the internal workings, consider the following points:

  1. Hash Function: The hashCode method of an object is crucial. A well-designed hash function distributes elements uniformly across the buckets, reducing the chances of collisions and maintaining performance.
  2. Bucket Structure: Each bucket in the HashSet is essentially a linked list or a balanced tree (for large sets). When multiple elements hash to the same bucket, they are stored in this list/tree.
  3. Load Factor: The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. The default load factor of 0.75 strikes a good balance between time and space costs.
  4. Rehashing: When the number of elements exceeds the product of the load factor and the current capacity, the hash table is resized (typically doubled in size) and the elements are rehashed to the new buckets. This makes sure that the HashSet maintains its performance characteristics.

By understanding these internal mechanisms, you can better appreciate the efficiency and flexibility of HashSets in Java.

Basic Operations on HashSets

Adding and Removing Elements

Adding elements to a HashSet is done using the add method. If an element already exists in the HashSet, it will not be added again. Removing elements can be done using the remove method. Here's an example demonstrating these operations:

import java.util.HashSet;

public class HashSetOperations {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();

// Adding elements to the HashSet
set.add("Avocado");
set.add("Peach");
set.add("Cherry");
set.add("Avocado"); // Duplicate element, won't be added

System.out.println("HashSet after adding elements: " + set);

// Removing an element
set.remove("Peach");

System.out.println("HashSet after removing Peach: " + set);
}
}

In this example, we create a HashSet of strings and add four elements to it, including a duplicate “Avocado”. The duplicate is ignored. We then remove “Peach” from the set.

Checking for Elements

To check if a HashSet contains a particular element, you can use the contains method. This method returns true if the element is present and false otherwise.

import java.util.HashSet;

public class HashSetContains {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();
set.add("Avocado");
set.add("Peach");
set.add("Cherry");

// Checking for elements
boolean containsAvocado = set.contains("Avocado");
boolean containsGrape = set.contains("Grape");

System.out.println("Contains Avocado: " + containsAvocado);
System.out.println("Contains Grape: " + containsGrape);
}
}

In this example, we check if the HashSet contains the elements “Avocado” and “Grape”. The output shows that “Avocado” is present, while “Grape” is not.

Iterating Over Elements

You can iterate over the elements in a HashSet using a for-each loop or an iterator. Iteration order is not guaranteed because HashSet does not maintain any specific order of its elements.

Using For-Each Loop

import java.util.HashSet;

public class HashSetIteration {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();
set.add("Avocado");
set.add("Peach");
set.add("Cherry");

// Using for-each loop
System.out.println("Using for-each loop:");
for (String fruit : set) {
System.out.println(fruit);
}
}
}

Using Iterator

import java.util.HashSet;
import java.util.Iterator;

public class HashSetIterator {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();
set.add("Avocado");
set.add("Peach");
set.add("Cherry");

// Using iterator
System.out.println("Using iterator:");
Iterator<String> iterator = set.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next());
}
}
}

In these examples, both the for-each loop and the iterator are used to iterate over the elements in the HashSet. The elements are printed to the console, but the order of iteration is not specified.

Size of the HashSet

To get the number of elements in a HashSet, use the size method. This method returns an integer representing the number of elements currently in the set.

import java.util.HashSet;

public class HashSetSize {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();
set.add("Avocado");
set.add("Peach");
set.add("Cherry");

// Getting the size of the HashSet
int size = set.size();

System.out.println("Size of the HashSet: " + size);
}
}

In this example, the size method is used to determine the number of elements in the HashSet, which is then printed to the console.

Clearing the HashSet

If you want to remove all elements from a HashSet, you can use the clear method. This method removes all elements, effectively emptying the set.

import java.util.HashSet;

public class HashSetClear {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();
set.add("Avocado");
set.add("Peach");
set.add("Cherry");

// Clearing the HashSet
set.clear();

System.out.println("HashSet after clearing: " + set);
}
}

In this example, the clear method is used to remove all elements from the HashSet, resulting in an empty set.

Is HashSet Empty?

To check if a HashSet is empty, use the isEmpty method. This method returns true if the set contains no elements and false otherwise.

import java.util.HashSet;

public class HashSetIsEmpty {
public static void main(String[] args) {
// Creating a HashSet
HashSet<String> set = new HashSet<>();

// Checking if the HashSet is empty
boolean isEmpty = set.isEmpty();

System.out.println("Is HashSet empty? " + isEmpty);

// Adding an element and checking again
set.add("Avocado");
isEmpty = set.isEmpty();

System.out.println("Is HashSet empty after adding an element? " + isEmpty);
}
}

In this example, we check if the HashSet is empty before and after adding an element. The output reflects the change in the set’s state.

Advanced Usage and Tips

Custom Objects in HashSet

HashSets can store custom objects, but it’s essential to properly override the equals and hashCode methods in these objects. This makes sure that the HashSet can correctly determine if two objects are equal and handle hashing effectively. Here's an example with a custom Person class:

import java.util.HashSet;
import java.util.Objects;

class Person {
private String name;
private int age;

public Person(String name, int age) {
this.name = name;
this.age = age;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age && Objects.equals(name, person.name);
}

@Override
public int hashCode() {
return Objects.hash(name, age);
}

@Override
public String toString() {
return name + " (" + age + ")";
}
}

public class HashSetCustomObjects {
public static void main(String[] args) {
HashSet<Person> set = new HashSet<>();
set.add(new Person("Alice", 30));
set.add(new Person("Bob", 25));
set.add(new Person("Alice", 30)); // Duplicate, won't be added

System.out.println("HashSet: " + set);
}
}

In this example, the Person class overrides equals and hashCode methods to make sure that two Person objects with the same name and age are considered equal. The HashSet uses these methods to manage elements properly.

Performance Considerations

While HashSets offer constant time performance for basic operations, several factors can affect their efficiency:

  • Hash Function: A good hash function is critical for performance. It should distribute elements uniformly across the buckets to minimize collisions. Poorly implemented hashCode methods can lead to many collisions, degrading performance.
  • Initial Capacity and Load Factor: The initial capacity of a HashSet and its load factor (default is 0.75) determine when rehashing occurs. Rehashing involves resizing the hash table and redistributing the elements, which can be an expensive operation. Adjusting these parameters based on the expected number of elements can optimize performance.
  • Collision Handling: When collisions occur, elements are stored in a linked list or a balanced tree within the same bucket. Frequent collisions can degrade performance, so it’s essential to ensure a good distribution of hash codes.

Common Use Cases

HashSets are versatile and can be used in various scenarios:

  • Removing Duplicates: HashSets are ideal for filtering out duplicate elements from a collection. For example, to get a list of unique usernames from a larger collection, you can use a HashSet.
import java.util.HashSet;
import java.util.List;
import java.util.ArrayList;

public class RemoveDuplicates {
public static void main(String[] args) {
List<String> usernames = new ArrayList<>();
usernames.add("Alice");
usernames.add("Bob");
usernames.add("Alice"); // Duplicate

HashSet<String> uniqueUsernames = new HashSet<>(usernames);

System.out.println("Unique Usernames: " + uniqueUsernames);
}
}
  • Membership Testing: HashSets provide an efficient way to check if an element is part of a set. This is useful for scenarios like checking if a user has a certain role or if a product is available in inventory.
import java.util.HashSet;

public class MembershipTesting {
public static void main(String[] args) {
HashSet<String> roles = new HashSet<>();
roles.add("Admin");
roles.add("User");
roles.add("Guest");

String roleToCheck = "Admin";
if (roles.contains(roleToCheck)) {
System.out.println(roleToCheck + " role exists.");
} else {
System.out.println(roleToCheck + " role does not exist.");
}
}
}
  • Set Operations: HashSets can perform mathematical set operations like union, intersection, and difference. These operations are useful in scenarios such as filtering data, finding common elements between collections, and excluding specific items from a set.
import java.util.HashSet;

public class SetOperations {
public static void main(String[] args) {
HashSet<String> set1 = new HashSet<>();
set1.add("Avocado");
set1.add("Peach");
set1.add("Cherry");

HashSet<String> set2 = new HashSet<>();
set2.add("Peach");
set2.add("Dragonfruit");
set2.add("Elderberry");

// Union
HashSet<String> union = new HashSet<>(set1);
union.addAll(set2);
System.out.println("Union: " + union);

// Intersection
HashSet<String> intersection = new HashSet<>(set1);
intersection.retainAll(set2);
System.out.println("Intersection: " + intersection);

// Difference
HashSet<String> difference = new HashSet<>(set1);
difference.removeAll(set2);
System.out.println("Difference: " + difference);
}
}

In these examples, we demonstrate how to use HashSets for removing duplicates, performing membership testing, and executing set operations like union, intersection, and difference.

Conclusion

HashSets in Java are a powerful tool for managing collections of unique elements. They offer efficient operations for adding, removing, and checking for elements, making them ideal for scenarios where performance and uniqueness are important. By understanding the basics, advanced usage, and common use cases of HashSets, you can use this flexible data structure to write more effective and efficient Java programs. Whether you’re filtering duplicates, performing set operations, or managing custom objects, HashSets provide a strong solution for your collection needs.

  1. Java HashSet Documentation
  2. Java Collections Framework

Thank you for reading! If you find this guide helpful, please consider highlighting, clapping, responding or connecting with me on Twitter/X as it’s very appreciated and helps keep content like this free!

--

--

Alexander Obregon

Software Engineer, fervent coder & writer. Devoted to learning & assisting others. Connect on LinkedIn: https://www.linkedin.com/in/alexander-obregon-97849b229/