Data Structures in Kotlin: Map — [PartIII]

Daniely Murua
wearejaya
Published in
12 min readJul 14, 2023

Hi everyone! Continuing our series of articles on data structures in Kotlin, in this article, we will talk about Maps! So without further ado, let’s get started.

What is a map?

A map (or dictionary) is a data structure where data is stored in a format consisting of a unique key and its corresponding value (key-Value pairs).

Imagine you have a storage box containing several smaller boxes, each containing a different type of fruit. Each smaller box is labeled with a unique number among the other boxes, making it easier to identify later. The next time you want to find the box of bananas, you can search for the label instead of opening each box until you find the one with bananas.

In the context of a map, the labels on the smaller boxes would be the keys, and the fruits inside the box would be the values. When you want to retrieve a box based on its label, you simply use the corresponding key, and the map will return the associated value.

Let’s see an example in code, where the key is the label number, and the value is what’s inside the box:

val storage = mapOf(
"234" to "Banana",
"382" to "Apple",
"453" to "Orange"
)

In this example, we have a Map called storage that associates each box’s content with its corresponding labeled box. We can access the values of the Map using the corresponding keys, such as storage["234"].

This allows us to quickly retrieve the value Banana by providing the key 234. Similarly, we can access the values Apple and Orange by using their respective keys 382 and 453.

Maps are a powerful tool for organizing and retrieving data based on unique keys. They provide efficient access to values and are commonly used in various programming scenarios.

Usage in Kotlin

In Kotlin, we have a hierarchy of interfaces for maps that is similar to the one we discussed for lists (in the previous article). However, unlike lists, the Map interface does not directly implement the Collection interface. Nevertheless, it is still considered a part of the Kotlin collections.

Let's take a look at the image below representing this hierarchy:

  • Map: The Map interface represents an immutable map. This means that once a map is created, it cannot be modified. To create such a structure in Kotlin, you can use the mapOf() function.
val fruitMap = mapOf(
"apple" to 5,
"banana" to 3,
"orange" to 7
)
  • MutableMap: The MutableMap interface extends the Map interface and represents a mutable map, which allows modifications to the map's structure and values. To create such a structure in Kotlin, you can use the mutableMapOf() function.
val fruitMap = mutableMapOf(
"apple" to 5,
"banana" to 3,
"orange" to 7
)
  • HashMap: The HashMap class is a mutable implementation of the MutableMap interface. It provides a hash table-based map that allows you to add, remove, and modify key-value pairs. It provides efficient lookup and modification operations based on hash codes.
val hashMap = hashMapOf(
"apple" to 5,
"banana" to 3,
"orange" to 7
)
  • LinkedHashMap: The LinkedHashMap class is a mutable implementation of the MutableMap interface. It preserves the insertion order of key-value pairs, making it suitable for scenarios where maintaining order is important.
val linkedHashMap = linkedMapOf(
"apple" to 5,
"banana" to 3,
"orange" to 7
)

You can choose between an immutable or mutable map, as well as different map implementations like HashMap, LinkedHashMap or TreeMap based on the requirements of your code.

For now, let’s talk about HashMap and LinkedHashMap in Kotlin.

HashMap

As mentioned earlier, the HashMap class in Kotlin is a mutable implementation of the MutableMap interface. It provides an efficient and flexible way to store and retrieve key-value pairs. The HashMap internally uses a hash table, also known as a hash map, to store the elements.

In a HashMap, each key-value pair is associated with a unique hash code that is calculated using hashing.

Hashing is a process that involves applying a hash function to an input value to generate a unique identifier, known as a hash code. This hash code is then used as an index in the hash table, allowing for quick access to the corresponding value in the hash table. That means if the same input is provided, it will always produce the same hash code.

In a hash table, key-value pairs are stored in buckets based on the value returned by the hash function applied to the key. This allows for efficient retrieval, as the hash function typically produces a unique value for each key. When a value needs to be retrieved based on a key, the hash function is applied to the key again, finding the corresponding bucket and enabling direct access to the stored value.

In the example below, we have a hash map called tenants that stores information about tenants. The keys are of type Int and represent the apartment numbers, while the values are of type String and represent the names of the tenants.

val tenants = hashMapOf<Int, String>()
tenants[20] = "Rachel and Monica"
tenants[14] = "Phoebe"
tenants[201] = "Ross"
tenants[19] = "Chandler and Joey"

Now let’s take a closer look at the hash table representation for this example.

In the case of the line tenants[20] = "Rachel and Monica", the key 20 is passed through the hash function, which generates a hash code 0 . This hash code is used to determine the index in the hash table where the key-value pair will be stored. In this case, the value "Rachel and Monica" will be associated with the key 20 in the hash table where the hash code is 0 .

However, it’s possible for two different elements to have the same hash value, which is known as a collision. The hash function is designed to evenly distribute the elements throughout the table, reducing the chances of collisions. However, collisions can still occur in some cases, and we need to handle them appropriately.

To handle collisions, different techniques are employed. One common approach is called chaining, where each position in the hash table contains a linked list of elements that have the same hash value. This way, if two elements produce the same hash code, they can be stored in the same position, avoiding conflicts. This linked list structure is often referred to as a bucket. Each bucket contains all the elements that share the same hash value.

Another technique is known as linear probing. In this method, if a collision occurs, the algorithm searches for the next available position in the table by sequentially moving forward until an empty slot is found. This ensures that all elements are eventually stored, even if collisions occur. Each position in the table, including the initial position and subsequent positions during probing, can be seen as a bucket that can store elements.

Characteristics

  • Unique Keys: HashMaps ensure that each key is unique within the map.
  • Duplicated Values: HashMaps can contain duplicate values, but not duplicate keys.
  • Hash Function: The hash function should distribute keys uniformly across the table to minimize collisions. Be cautious when overriding the default hash function and choose one that distributes elements effectively.
  • Collision Resolution: Handle collisions with techniques like chaining or linear probing.
  • Null Values: HashMaps can handle null values for both keys and values.
  • Flexible Size: HashMaps can grow or shrink dynamically based on the number of entries.
  • Transformation and Manipulation: HashMaps offer operations to add, update, or remove entries, merge maps, and filter entries based on specific criteria.
  • Synchronization: The HashMap implementation is not synchronized, so if multiple threads concurrently access and modify the map, external synchronization is required. Synchronization can be achieved by wrapping the map using the Collections.synchronizedMap method.
  • Fail-fast iterators: The iterators returned by HashMap’s “collection view methods” are designed to quickly detect concurrent modifications during iteration and throw a ConcurrentModificationException. However, their behavior cannot be guaranteed in the presence of unsynchronized concurrent modifications. It is important to use proper synchronization practices when accessing and modifying a HashMap in a multithreaded environment.

Performance considerations

HashMap has two parameters that affect its performance: initial capacity and load factor.

When creating a HashMap, it is important to set an initial capacity for the buckets that the HashMap should have. This initial capacity affects the performance of the HashMap. A higher capacity allows the HashMap to have more buckets, which helps distribute the elements and reduces collisions. However, the extra capacity, in addition to the elements, can result in frequent resizing or rehashing operations. These operations are necessary to maintain the performance of the HashMap, but if they occur too frequently, they can impact the overall performance of the HashMap.

Resizing refers to changing the capacity of a hash table, which involves allocating a new underlying array and moving the elements from the old array to the new one. This can be can be an expensive operation in terms of time and memory.

Rehashing, on the other hand, is the process of recalculating the hash codes and redistributing the elements within the hash table after resizing. It ensures that the elements are placed in new positions corresponding to the updated capacity and hash function.

Therefore, it is important to find a balance in choosing the initial capacity to avoid excessive resizing or rehashing operations.

The load factor of a hash table represents the ratio of occupied buckets to the total number of buckets in the table. It indicates how full the hash table is. Maintaining an appropriate load factor is crucial for efficient performance. If the load factor becomes too high, it means the table is becoming crowded, which can result in increased collisions and slower performance.

Let’s see an example of creating a HashMap with a capacity of 16 and a load factor of 0.75.

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed, which means internal data structures are rebuilt to approximately double the number of buckets. Minimizing rehash operations can be achieved by setting an appropriate initial capacity.

In such cases, resizing the hash table by increasing its capacity can help maintain a balanced load factor and ensure efficient operation.

So, it’s important to choose the right initial capacity and load factor to get the best performance out of your HashMap. Let’s explore some additional performance characteristics:

  • 🚀 Constant-time performance: HashMap provides constant-time access (O(1)) for basic operations like get and put.
  • 🚀 Efficient Key Lookup: Maps provide fast retrieval of values based on their associated keys.
  • 🚀 Widely Used: Hash tables are used in databases, caches, compilers, and indexing systems due to their fast and efficient data retrieval.
  • 🚀/🔻 Iteration performance: Iterating over collection views of a HashMap requires time proportional to the capacity of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Setting the initial capacity too high or the load factor too low can impact iteration performance.
  • 🔻 Collisions: In cases of frequent collisions, the performance can degrade, increasing access time to O(n).
  • 🔻 Memory Usage: Hash tables provide fast access but may consume more memory compared to other data structures.

Let's see an example in Kotlin:

// Creating a HashMap with fruits and their quantities using the hashMapOf function
val fruitMap = hashMapOf(
"apple" to 5,
"banana" to 10,
"orange" to 3
)

// Accessing elements in the HashMap
println(fruitMap["apple"]) // Output: 5
println(fruitMap["banana"]) // Output: 10
println(fruitMap["orange"]) // Output: 3

// Modifying the quantity of a fruit
fruitMap["banana"] = 15
println(fruitMap["banana"]) // Output: 15

// Iterating over the elements of the HashMap
for ((fruit, quantity) in fruitMap) {
println("$fruit: $quantity")
}

//Output:
//banana: 15
//orange: 3
//apple: 5

LinkedHashMap

The LinkedHashMap class is a component of the Java Collections Framework that combines the functionalities of a hash table and a linked list. It is similar to HashMap in terms of storing key-value pairs, but it also maintains the order of insertion and maintains a doubly-linked list running through all of its entries.

Characteristics

  • Combination of Hash Table and Linked List: LinkedHashMap internally uses a hash table to store the elements and maintains a doubly-linked list running through all of its entries.
  • Support for Null Elements: LinkedHashMap permits null elements as keys or values.
  • Mutable Operations: LinkedHashMap supports all the mutable operations provided by the MutableMap interface. You can add, remove, or modify key-value pairs in the map.
  • Predictable Iteration Order: The linked list defines the iteration ordering of the map, ensuring that the order of keys remains consistent over time.
  • Access-Ordered and Insertion-Ordered: LinkedHashMap offers the option to maintain the order of entries based on their access time (access-order) or insertion time (insertion-order). The default value for the accessOrder parameter is false. Let's see an example:
  • Suitable for LRU Caches: LinkedHashMap with access-order is particularly useful for building LRU (Least Recently Used) caches.
  • Not Synchronized: Like other map implementations, LinkedHashMap is not synchronized. External synchronization is required when multiple threads access the map concurrently and at least one thread modifies it structurally.
  • Fail-Fast Iterators: The iterators returned by LinkedHashMap’s collection view methods are fail-fast, throwing a ConcurrentModificationException if the map is structurally modified during iteration, except through the iterator’s own remove method.

Performance considerations

Both HashMap and LinkedHashMap have initial capacity and load factor as parameters that affect performance. In the case of HashMap, if the initial capacity is too high, there may be unnecessary memory allocation and waste.

However, in the case of LinkedHashMap, although the initial capacity can impact performance in terms of memory consumption, the factor that truly affects iteration is the structure of the linked list and the order of insertion of elements. In other words, the initial capacity does not have a direct effect on the iteration times of LinkedHashMap.

LinkedHashMap has similar performance to HashMap for basic operations like adding, accessing, and removing elements. Let’s take a look at some aspects:

  • 🚀 Efficient Performance: LinkedHashMap provides constant-time performance for basic operations like get, put, and remove, similar to HashMap.
  • 🚀/🔻Iteration performance: Iteration over the collection views of a LinkedHashMap requires time proportional to the size of the map.
  • 🔻 HashMap comparison: The performance of LinkedHashMap is slightly lower than that of HashMap due to the additional overhead of maintaining the linked list.

Here’s an example of using LinkedHashMap in Kotlin:

// Creating a LinkedHashMap with fruits and their quantities using the linkedMapOf function
val fruitMap = linkedMapOf(
"apple" to 5,
"banana" to 10,
"orange" to 3
)

// Accessing elements in the LinkedHashMap
println(fruitMap["apple"]) // Output: 5
println(fruitMap["banana"]) // Output: 10
println(fruitMap["orange"]) // Output: 3

// Modifying the quantity of a fruit
fruitMap["banana"] = 15
println(fruitMap["banana"]) // Output: 15

// Iterating over the elements of the LinkedHashMap
for ((fruit, quantity) in fruitMap) {
println("$fruit: $quantity")
}

// Output:
// apple: 5
// banana: 15
// orange: 3

And that’s it! I hope this article has been helpful to you. A great tip that has helped me a lot in understanding these data structures is to analyze the code behind each of these classes. Try to dive into the class, look at the constructors, examine the functions, and see the inheritance hierarchy. All of this will make everything much clearer.

In the next article, we will discuss the data structure Set. See you there! 🙋🏼‍♀️

References

Skiena, Steven S. The Algorithm Design Manual. 2nd ed., Springer, 2008.

--

--

Daniely Murua
wearejaya

Mobile engineer at Jaya Tech and gaming enthusiast. 📱🎮