Data Structures in Kotlin: Set — [PartIV]

Daniely Murua
wearejaya
Published in
7 min readJul 14, 2023

The Set is a Kotlin data structure that stores a collection of unique elements, meaning there are no duplicate elements.

Let’s imagine a contact list. In this list, you have various contacts, and none of them are repeated. Now, imagine that you want to add a new contact but want to ensure that it is not duplicated. Essentially, the Set can guarantee that for you. If you try to insert a contact that already exists in the list, the Set will ignore that operation and keep only one occurrence of it.

The Set has various operations such as addition, removal, union (combining two sets), intersection (finding common elements in two sets), difference, among others.

It is highly efficient when we need to check if a particular element is present in the Set.

The Set uses other types of data structures to store elements efficiently. Therefore, its performance and characteristics depend on the chosen data structure.

Let’s see an example of Set implementation in Kotlin:

val mySet = setOf("apple", "banana", "orange")
println(mySet) // Output: [apple, banana, orange]

Usage on Kotlin

As we have seen in the previous articles, just like List, Set also has its hierarchy of interfaces, starting with the Collection interface. Let’s briefly discuss each one.

  • Collection: It is the root interface of the hierarchy. It represents the common behavior of a read-only collection, such as retrieving the size and checking the existence of elements.
  • Set: Is the base interface for Set data structures in Kotlin. It defines the basic structure of an immutable Set, which contains a collection of unique elements without a specific order. To create a set in Kotlin, you can use the function setOf().
 val fruits = setOf("apple", "banana", "orange")
  • MutableSet: The MutableSet interface extends the Set interface and adds modification operations, allowing for adding, removing, and updating elements in the set. To create a set in Kotlin, you can use the function mutableSetOf().
val fruits = mutableSetOf("apple", "banana", "orange")
  • HashSet: HashSet is an implementation of the MutableSet interface that uses a hash table to store elements, as we discussed in the Map article, remember? The difference here is that instead of storing values in the format of key-value pairs, only the value of the element is associated with the hash code. To create a set in Kotlin, you can use the function hashSetOf().
val fruits = hashSetOf("apple", "banana", "orange")
  • LinkedHashSet: LinkedHashSet also implements the MutableSet interface and offers the same functionalities as HashSet. The difference is that LinkedHashSet maintains the insertion order of the elements, unlike HashSet which does not guarantee the order of the elements. It is implemented as a combination of a hash table and a linked list. To create a LinkedHashSet in Kotlin, you can use the function linkedSetOf().
val fruits = linkedSetOf("apple", "banana", "orange")

Despite not inheriting from MutableSet, there is another tree-based Set structure called TreeSet, which we will discuss further when we cover tree structures. For now, let’s talk about HashSet and LinkedHashSet.

HashSet

HashSet, as the name suggests, is based on the hash table structure, similar to what we saw in the topic about HashMap. It carries all the peculiarities of a hash table. The difference between HashMap and HashSet lies precisely in the differences between Map and Set. While HashMap uses key-value pairs to store its elements, HashSet stores only individual elements, and each element is unique with no duplicates.

Let’s analyze an example where we want to create a HashSet to store all the tenants of a building:

val tenants = hashSetOf("Rachel", "Monica", "Phoebe")

In this example, we do not want to duplicate tenants, and the order does not matter. Therefore, HashSet can be suitable. It is efficient for addition, removal, and membership checking operations.

As mentioned earlier, in a hash table, each element will be processed by a hash function to generate a unique value known as a hash code. This hash code will be used to determine the storage position of the element in the hash table.

The resulting hash table may look like this:

In a HashSet, the data structure is designed to avoid the presence of duplicate elements. When an element is added to the HashSet, the hashing algorithm is used to calculate the hash code of the element. Then, the HashSet checks if there is already any element with the same hash code. If there isn’t, the element is added to the set.

If there is an element with the same hash code, the HashSet performs an additional comparison using the equals() method to check if the elements are equal. If the element already exists in the set, it will not be added again, thus avoiding duplicates.

This approach ensures that each element in the HashSet is unique because the calculation of the hash code and equality comparison are used to guarantee the uniqueness of the elements.

Characteristics

  • Iteration Order: Does not guarantee a specific iteration order of the set; the order may change over time.
  • Null Element Support: Allows the null element to be stored.
  • Parameters for Performance: HashSet has parameters for initial capacity and load factor.
  • Synchronization: Not synchronized by default. If multiple threads access a HashSet concurrently and at least one thread modifies the set, external synchronization is required.
  • Fail-Fast Iterators: Fail-fast behavior is not guaranteed in the presence of unsynchronized concurrent modification. So, if the set is modified during iteration, except through the iterator’s own remove method, a ConcurrentModificationException will be thrown.

Performance considerations

  • 🚀 Constant-Time Operations: HashSet offers constant-time performance (0(1)) for basic operations such as add, remove, contains, and size. This means that these operations have a consistent and efficient performance regardless of the size of the set.
  • 🚀/🔻 Performance Impact of Hash Collisions: In cases of hash code collisions, where multiple elements generate the same hash code, the performance of HashSet can be affected, potentially leading to longer execution times.
  • 🚀/🔻Load Factor Impact: The performance of HashSet can be influenced by the load factor, which is the ratio of the number of elements stored to the total capacity. A higher load factor can lead to more collisions and slightly slower performance.
  • 🔻 Memory Consumption: HashSet consumes a considerable amount of memory as it needs to store elements in a hash table structure. The memory usage depends on the number of elements present, initial capacity, and load factor. Generally, HashSet tends to consume more memory compared to simple linear data structures like a list.

LinkedHashSet

I think from this point onwards, as you already understand most of the data structures, just by reading their names, it becomes somewhat more obvious, right?

LinkedHashSet is a combination of using the Set structure with a hash table and a doubly-linked list running through all of its entries.

Therefore, the main difference between HashSet and LinkedHashSet is how the elements are stored. While HashSet does not guarantee a specific order of elements, LinkedHashSet maintains the insertion order of the elements. This means that when iterating over the LinkedHashSet, the elements are returned in the same order in which they were added.

Characteristics

  • Predictable Iteration Order: It maintains a predictable iteration order based on the order of element insertion (insertion-order).
  • Consistent Insertion Order: The insertion order remains unchanged if an element is re-inserted into the set.
  • Supports All Set Operations: LinkedHashSet provides all the optional Set operations and allows null elements.
  • Parameters for Performance: LinkedHashSet has parameters for initial capacity and load factor, defined similarly to HashSet.
  • Not Synchronized: The implementation is not synchronized by default. External synchronization is needed for concurrent access to a linked hash set.
  • Fail-Fast Iterators: Iterators returned by LinkedHashSet’s iterator method are fail-fast, throwing ConcurrentModificationException if the set is modified (except through the iterator’s own remove method).

Performance considerations

  • 🚀 Constant-Time Performance for Basic Operations: LinkedHashSet offers constant-time performance (0(1)) for basic operations such as add, contains, and remove, assuming the hash function disperses elements properly among the buckets.
  • 🚀/🔻 Iteration Performance: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity.
  • 🔻 Memory Consumption: LinkedHashSet consumes slightly more memory than HashSet due to the need to store additional information to maintain the order of elements. This can be considered a negative point in terms of memory consumption.

--

--

Daniely Murua
wearejaya

Mobile engineer at Jaya Tech and gaming enthusiast. 📱🎮