3 Essential Questions About Hashable in Python

From general discussion to specific implementation

Yong Cui
Yong Cui
May 19 · 6 min read
Image for post
Image for post
Photo by Yeshi Kangrang on Unsplash

As a general-purpose programming language, Python provides a good range of built-in data types for various use cases.

When you learned these basics, you have probably encountered the mentioning of hashable at certain points. For example, you may see that the keys in a dict need to be hashable (see a trivial example in the code snippet below).

For another instance, it’s mentioned that the elements in a set need to be hashable.

You may wonder — What does hashable mean exactly? Which objects are hashable and which are not? What will happen if we use unhashable objects as keys for dictionaries? And so on. Many related questions can be asked.

In this article, we’ll go over some key points about hashability such that you’ll learn how to address these questions. In the end, you’ll probably find out that these questions are actually not hard at all, unlike what you may have thought initially.

Which Objects Are Hashable and Which Are Not?

Before we begin any mechanistic explanation, the first question that we want to address is which objects are hashable and which are not.

Because we know that Python explicitly requires that the elements in a set should be hashable, we can test an object’s hashability by simply trying to add the object to a set. Successful insertion indicates the objects being hashable and vice versa.

>>> # Create an empty set object
>>> elements = set()
>>>
>>> # The list of objects with each to be inserted to the set
>>> items = [1, 0.1, 'ab', (2, 3), {'a': 1}, [1, 2], {2, 4}, None]

As shown in the above code, I created a set variable called elements and a list variable called items, which includes the most commonly used built-in data types: int, float, str, tuple, dict, list, set, and NoneType.

The experiment that I’ll run is to add each of the items to the elements. I won’t use the for loop in this case, because any possible TypeError will stop the iteration. Instead, I’ll just retrieve individual items using indexing.

As you can see in the above code snippet, here’s a quick summary of the experiment’s results.

Answer to the section’s question

  • Hashable data types: int, float, str, tuple, and NoneType.
  • Unhashable data types: dict, list, and set.

If you’re completely new to Python programming, you may have noticed that these three unhashable data types are all mutable in nature, while these five hashable data types are all immutable.

In essence, these mutable data are objects whose values can be changed after their creation, while the values of immutable objects can’t be changed after the creation.

Data mutability is a standalone topic that I have covered previously in my other article.

What Does Hashable Mean?

You now have some ideas about which objects are hashable and which are not, but what does hashable mean, exactly?

Actually, you may have heard many similar computer terminologies related to hashable, such as hash value, hashing, hash table, and hashmap. At their core, they share the same fundamental procedure — hashing.

Image for post
Image for post
General process of hashing (Wikipedia, Public Domain)

The above diagram shows you the general process of hashing. We start with some raw data values (termed keys in the figure).

A hash function, which is sometimes termed a hasher, will carry out specific computations and output the hash values (termed hashes in the figure) for the raw data values.

Hashing and its related concepts require a whole book to get clarified, which is beyond the scope of the current article. However, some important aspects have been discussed briefly in my previous article.

Here, I’ll just highlight some key points that are relevant to the present discussion.

  1. The hash function should be computationally robust such that different objects should have different hash values. When different objects have the same hash value, a collision occurs (as shown in the figure above) and should be handled.
  2. The hash function should be consistent such that the same objects will always lead to the same hash values.

Python has implemented its built-in hash function that produces hash values for its objects. Specifically, we can retrieve an object’s hash value by using the built-in hash() function. The following code shows you some examples.

As shown above, we were able to get the hash values — integer numbers for the int and tuple objects.

However, neither the list object nor the dict object had hash values. These results are consistent with the distinction that we’re making between hashable and unhashable objects in the last section.

Answer to the section’s question

  • Hashable: A characteristic of a Python object to indicate whether the object has a hash value, which allows the object to serve as a key in a dictionary or an element in a set.

How Can We Customize Hashability?

The flexibility of Python as a general-purpose programming language mainly comes from its support of creating custom classes. With your own classes, many related data and operations can be grouped in a much more meaningful and readable way.

Importantly, Python has evolved to be smart enough to make our custom objects hashable by default in most cases.

Consider the following example. We created a custom class, Person, which would allow us to create instances by specifying a person’s name and social security number.

Notably, we overrode the default __repr__() function using the f-string method, which would allow us to display the object with more readable information, as shown in the last line of the code snippet.

As shown in the above code, we can find out the hash value for the created object person0 by using the built-in hash() function. Importantly, we’re able to include the person0 object as an element in a set object, which is good.

However, what will happen if we want to add more Person instances to the set? A more complicated, but probable scenario is that we construct multiple Person objects of the same person and try to add them to the set object.

See the following code. I created another Person instance, person1, which has the same name and social security number — essentially the same natural person.

However, when we added this person to the set object, persons, both Person objects are in the set, which we would not want to happen.

Because, by design, we want the set object to store unique natural persons. Consistent with both persons included in the set object, we found out that these two Person instances are indeed different.

I’ll show you the code of how we can make the custom class Person smarter so that it knows which persons are the same or different, for that matter.

In the above code, we updated the custom class Person by overriding the __hash__ and __eq__ functions.

We have previously mentioned that the __hash__() function is used to calculate an object’s hash value. The __eq__() function is used to compare the object with another object for equality and it’s also required that objects that compare equal should have the same hash value.

By default, custom class instances are compared by comparing their identities using the built-in id() function (learn more about the id() function by referring to this article).

With the updated implementation, we can see that when we were trying to create a set object that consisted of the two Person objects, the __hash__() function got called such that the set object only kept the objects of unique hash values.

Another thing to note is that when Python checks whether the elements in the set object have unique hash values, it will make sure that these objects aren’t equal as well by calling the __eq__() function.

Answer to the section’s question

Customization: To provide customized behaviors in terms of hashability and equality, we need to implement the __hash__ and __eq__ functions in our custom classes.

Conclusion

In this article, we reviewed the concepts of hashable/hashability in Python.

Specifically, by addressing the three important questions, I hope that you have a better understanding of hashability in Python. When it’s applicable, you can implement tailored hashability behaviors for your own custom classes.

Better Programming

Advice for programmers.

Thanks to Zack Shapiro

Yong Cui

Written by

Yong Cui

Work at the nexus of biomedicine, data science & mobile dev. Love to write on these technological topics. Follow me @ycui01 on Twitter to get latest articles.

Better Programming

Advice for programmers.

Yong Cui

Written by

Yong Cui

Work at the nexus of biomedicine, data science & mobile dev. Love to write on these technological topics. Follow me @ycui01 on Twitter to get latest articles.

Better Programming

Advice for programmers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store