What Is Hashable in Swift?

A deep dive into the Hashable protocol and its broader background

Yong Cui, Ph.D.
Jan 2 · 13 min read
Image by Carlos Alberto Teixeira from Pixabay

When we learn the basic data types in Swift, the official documentation mentions the Hashable protocol several times.

For example, a key in the Dictionary data type needs to be Hashable. For another example, the elements in the Set data type need to be Hashable, too. As shown below, the declarations for Dictionary and Set clearly specify such requirements:

@frozen struct Dictionary<Key, Value> where Key : Hashable@frozen struct Set<Element> where Element : Hashable

The documentation lists several Hashable data types, including booleans, integers, and strings, so that they can be used as the keys for a Dictionary and as the elements for a Set.

However, it’s unclear to us what makes these data types Hashable, and how we can make our own data types Hashable.

If we step back, we may also wonder why we bother making certain data types Hashable in the first place. Do they have particular advantages? We can ask many more questions on this topic.

In this article, I’m trying to provide an in-depth overview of the Hashable protocol in Swift and its common use cases. Specifically, we’re going to address the following questions/topics.


Hashing: What and Why

When we talk about hash, many terms have been frequently used in various settings, including hash, hashing, hasher, hash code, hash value, hash table, and hash function.

Even if we are not familiar with all of these terms, we do use them a lot in various programming languages as well as our everyday life (e.g., password encryption), at least indirectly.

Basic concepts

Generally speaking, hashing is the process of applying an algorithm to convert a data item to a value. The data item can be as simple as an integer, a string, or as complex as an object with multiple properties.

The algorithm is termed the hash function or hasher. The converted value is termed the hash value, hash code, or simply hash.

Diagram of the Hashing Process

The above diagram shows you a simplified example of a hashing process. Essentially, we start with four kinds of fruit (i.e., data items) on the left side.

The algorithm (i.e., hash function) can convert these four names as input to four integers (i.e., hash values) as output on the right side, which look like random numbers with no obvious connection with their original inputs.

There are a few aspects to be noted.

Use cases

Hashing has been widely used in various aspects of our daily life.

In this section, I will use the following examples to show you three common use cases of hashing in database operations, cryptography, and data structures in programming.

Database operation: searching

Almost all websites and mobile apps have a search function somewhere in their applications. The implementation of the search function involves the use of hashing.

Let’s see the example below. Suppose that you have a list of local small businesses in a database table. One day, you want to have your car serviced, and you have the impression that one of your friends recommended a store called John’s Mechanic.

When you search in this table using the store’s name, the database needs to compare this 15-character string with the business name field for each of the records.

Imagine that the number of records can be hundreds of thousands such that this process can be very lengthy and lack efficiency. In computing jargon, the time complexity of this search function is O(n), which means that the time needed is linearly proportional to the size of the data size.

Somehow, the database owner learned the trick of hashing, and used a hash function to create unique hash values for each of the businesses and used them as an index for all of these businesses.

Now, when you make the search request using the store name John’s Mechanic, the database will first use the hash function to calculate the hash value for this search term.

As mentioned previously, the hash function is expected to produce the same hash value for the same store, such that we’ll just look up the index and find out where the data record is located.

On average, the time complexity of the search function with the use of hashing is O(1) — a constant amount of time independent of the data size.

Cryptography: password

To provide a personalized experience, it makes sense that many websites and apps require the users to create their own accounts.

During the sign-up process, we usually provide our email address and a password for the new account on the website. These two pieces of text will be sent over the internet before they arrive at the destination server.

Somehow, the website developers didn’t know anything about hashing, and they thought cybersecurity wasn’t a concern to them.

So, the sign-up request was designed to be sent via plain text. It has a significant data security risk. If the request is intercepted, the password is stolen very easily as it is just plain text!

When it’s used, the password will be hashed to a seemingly random text. If the sign-up request is intercepted, only this hashed password will be exposed.

As mentioned before, we usually design hash functions that make it almost impossible to invert hash values to their original data items. Thus, it is a safer approach. The website’s database can just store this hashed password.

When we’re trying to log in, the website will just compare the supplied password (again, hashed) with the saved hashed password, because they are expected to be the same as the input (i.e., the correct password in plain text) is identical.

Data structures in programming: Dictionary

Almost all modern programming languages have the Dictionary data type, although they may use a different name like associative array, map, hashmap, hash, or object.

However, one thing in common is that it is an unordered collection of key-value pairs. In plain language, the Dictionary stores data in pairs and each pair has a key and a value.

Several functions pertain to the use of the Dictionary data type, including data retrieval, insertion, and deletion. For the sake of the current tutorial in Swift, I’ll just use the Swift language to show you these functions.

Other languages involve differential syntax, but the concept should be very similar.

Behind the scenes, a Dictionary is a type of hash table, providing fast access to the entries it contains. For each key, the hash function will calculate a hash value as an index to store that data item, and some people refer to it as a bucket (i.e., place to store the data).

So, when we retrieve a data item, the program will calculate the hash value for the key provided, and look up the index if a key with the same hash value exists.

If it does, the corresponding data item is returned. If it doesn’t, nil will be returned. This is why the data type returned from a dictionary is Optional, as the availability isn’t guaranteed.

When we insert a new key-value pair, the hash function will just compute the hash value for the key as the index for that data item, while we remove a key-value pair, we’ll simply remove the index that is computed from the hash value.


Hashable Protocol

After having a general understanding of hashing, we can now narrow down our scope by focusing on the Hashable protocol in the Swift standard library.

When we refer to the Hashable protocol documentation page, we find out that the definition of Hashable is that:

“A type that can be hashed into a Hasher to produce an integer hash value.”

The definition is very clear, and we can identify three keywords in this definition: type, Hasher, and integer. I’ll explain each of them in detail below.

Type

Take a set as an example. As mentioned before, all elements in a set have to be hashable. On the surface, it merely means that each individual element is hashable.

However, it actually means that the concrete type of this generic Set data type needs to conform to the Hashable protocol. In other words, just like other protocols, the Hashable protocol is implemented at the type (e.g., class, struct) level, so that all its instances are hashable.

In addition, the type doesn’t only include the standard data types, like integers, strings, but it also includes custom types, for which I’ll show you an example later in this article on the implementation of Hashable protocol.

Hasher

Hasher is defined as the universal hash function used by Set and Dictionary as per the official documentation. Behind the scenes, Hasher is actually implemented as a struct, so we can conveniently instantiate a hasher with empty state by calling var hasher = Hasher().

The standard universal hash function uses a per-execution random 128-bit seed value, making hash values less predictable.

More specifically, the universal hash function used in the Swift standard library is SipHash, which was originally designed by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012.

As of now, the SipHash-1-3 with 320 bits of state is implemented in the current library. It should be noted that Apple does warn that the universal hash function and its implementation may change in any new release.

With this Hasher instance, we can feed data (e.g., integers, strings) to it with the mutating combine(bytes:) or combine(_:) method multiple times as desired.

For the combine(bytes:) method, we feed the Hasher with new bytes of data as they are available in a single contiguous region in memory, mixing its bits with the Hasher state.

The combine(_:) method is just a convenience operation that takes in a Hashable value, such as an integer or string. Behind the scenes, this combine(_:) method calls the hash(into:) method to mix its value into the Hasher state.

After you’re done with these combine methods to update the Hasher state, we’ll just call the finalize method, which will finalize the Hasher state and return the hash value.

Notably, calling this method will consume the Hasher, and thus it’s advised that you not finalize a Hasher you don’t own or perform operations on a finalized Hasher.

Another important aspect of the finalize method is that the finalization process is computationally expensive. In SipHash-1-3, it costs three times as much as a single 64-bit combine operation.

Thus, we need to keep in mind that we want to reduce the use of finalize if it’s at all possible when we implement the Hashable protocol in our custom types to make our code more efficient.

Integer

This can’t be more straightforward. The above-mentioned Hasher will calculate the hash value for a given instance, and more specifically, the hash value is an integer.

But, it should be noted that the hash values don’t always have to be positive due to integer overflow, which can produce negative hash values from time to time.

Because of different seed values by the hash function, the generated hash values will be different in each execution of a Swift program so that you don’t want to save hash values for future reference, which will result in unpredictable behaviors in your programs.

Another thing to mention is that the Hashable protocol inherits from the Equatable protocol, so that if a data type conforms to Hashable, it has to conform to Equatable as well.

What will happen if two instances have the same hashValue? This phenomenon is called collision. There are a few ways to resolve the collision.

One common approach is called chaining. Instead of storing a single item at each bucket, the hash table can chain multiple items together so that each hash key has a reference to a linked list.

The other common approach is called linear probing or open addressing, and this one is also implemented internally by Swift’s Dictionary type for collision handling.

Basically, this method solves collision by using the next available slot (or bucket). If it reaches the end, it will go back to the beginning until it finds a bucket for the item.


Conformation to the Hashable Protocol

In our development, we avoidably need to create some custom data types from time to time. Sometimes, we take a step further and want to use these types as dictionary keys or elements in a set.

As mentioned at the beginning of this article, to achieve that, your custom data types should conform to the Hashable protocol. In this section, I’m showing you how it can be done.

Simple data structure

First, we create a custom struct called Student with two String properties, firstName and lastName. Suppose that we want to use a Dictionary to track the final exam scores for the student.

So, we instantiate a Dictionary var scores = [Student: Int](). However, Xcode gives us an error: Type ‘Student’ does not conform to protocol ‘Hashable’.

Please note that while I’m writing this article, I’m using Xcode 11.3 and Swift 5.1.

struct Student {
var firstName: String
var lastName: String
}

To remove this error, we need to implement the hash(into:) method required by the Hashable protocol as below. The universal hash function hasher uses the combine method to hash the string values for the student’s first and last names.

extension Student : Hashable {
func hash(into hasher: inout Hasher) {
hasher.combine(firstName)
hasher.combine(lastName)
}
}

It’s important to note that we don’t call the finalize method in the hash(into:) method, because the compiler will automatically complete the calculation by calling the finalize method using the code below when the hash value is needed.

var hashValue: Int {
var hasher = Hasher()
self.hash(into: &hasher)
return hasher.finalize()
}

After making our custom type Student conform to the Hashable protocol, the above-mentioned error is gone, as expected. From now on, a Student instance can be a Dictionary key or a Set element.

Compound data structure

At school, the teacher decides to pair the students up so that they can work on their group project. To help the teacher manage group assignment and tracking, we can create a new struct called Dyad as below.

struct Dyad {
var leader: Student
var teammate: Student
}

Similarly, we can create a dictionary var dyadScores = [Dyad: Int]() to help the teachers track the scores for the dyads.

Again, Xcode will tell us that the Dyad type doesn’t conform to the Hashable protocol. Some developers who have prior experience with implementing the Hashable protocol may do this:

extension Dyad: Hashable {
func hash(into hasher: inout Hasher) {
hasher.combine(leader.hashValue)
hasher.combine(teammate.hashValue)
}
}

It’s certainly not wrong, but as mentioned previously, the finalization process is computationally expensive. So, we can estimate how much time we need to calculate the hashValue for a Dyad instance.

Assume that the time needed to run a combine method is a unit time of t, so the time needed for running a finalize method is 3t. To calculate the hashValue for a Student instance, it requires 5t (i.e., two calls of combine and one call of finalize).

So, with the above code, the time needed to calculate the Dyad’s hashValue will be 15t (5t for the leader, 5t for the teammate, and 5t for the Dyad itself) in total. Can we make if more efficient?

The answer is yes. The following code minimizes the use of the finalize method, which drastically reduce the time for calculating the Dyad’s hashValue.

With this improved version, the time needed is only 7t, which is even less than half of the computation time needed when using the original version.

So, one of the take-home messages is that when you make your custom type conform to the Hashable protocol, write the hash(into:) function with minimal usage of calling the finalize method.

extension Dyad: Hashable {
func hash(into hasher: inout Hasher) {
hasher.combine(leader.firstName)
hasher.combine(leader.lastName)
hasher.combine(teammate.firstName)
hasher.combine(teammate.lastName)
}
}

Automatic synthesis

However, in fact, since Swift 4.1, automatic synthesis of Hashable conformance has been supported by the compiler for certain types, including struct.

Thus, if we simply declare the Student to conform to the Hashable protocol without using an extension. As both of its store properties (i.e., firstName and lastName) are Hashable, the struct itself becomes automatically Hashable.

You don’t need to write your own hash(into:) or hashValue implementations. Especially for the latter, Swift 5 has deprecated the hashValue in favor of the hash(into:) method, although it can still compile your earlier code that implements the hashValue in your custom types to conform to the Hashable protocol.

struct Student : Hashable {
var firstName: String
var lastName: String
}

In addition, the automatic synthesis is pretty powerful. As per the code below, as long as its stored properties are Hashable, the compiler can make your custom type automatically Hashable, even if some properties are of custom types.

struct Dyad : Hashable {
var leader: Student
var teammate: Student
}

Takeaways

After reading this article, I hope that you have a better understanding of the Hashable protocol in Swift and its broader background.

With the implementation of a universal hash function and automatic synthesis feature in the current release of Swift, it has become so convenient to implement the Hashable protocol.

Better Programming

Advice for programmers.

Yong Cui, Ph.D.

Written by

Addiction Scientist, iOS/Android Developer, Entrepreneur

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade