Python: How Key lookup works in Dictionary and how to customize it?
Disclaimer: While this article touches upon a few concepts related to dictionary objects in Python, it is not a comprehensive guide. The aim of this article is to provide the readers a basic understanding of how 'keys' get looked up in Python dictionary objects. The example code snippets have been developed & tested in Python 3.6.0
This article(data used for name and emails in code snippets) is a work of fiction. Names, characters, businesses, places, events, locales, and incidents are either the products of the author’s imagination or used in a fictitious manner. Any resemblance to actual persons, living or dead, or actual events is purely coincidental.
(Better safe than sorry 😝)
Note: There is a list of references at the end of this article for more detailed explanation of each individual concepts.
What is a dictionary in Python?
A dictionary is a very useful & easy to use data structure that comes with Python. Even if one hasn’t come across dictionaries yet, this article will cover the essentials needed to understand the key(pun intended?) topic in discussion here.
A dictionary in Python is an unordered collection of items where each item is stored as a key:value pair. Basically a mapping object that maps individual values with unique keys
# Sample Dictionary
friends = {
"Steven": {
"email": "steven_adams@gmail.com",
"movies_watched": ["Batman: Arkham Knight", "Avengers: Infinity War"]
},
"Claus": {
"email": "claus_miller@yahoo.com",
"movies_watched": ["The Shawshank Redemption", "Harry Potter And the Goblet of Fire"]
},
"Bridget": {
"email": "bridget_edu@hotmail.com",
"movies_watched": ["Inside Out", "How to Train Your Dragon"]
},
"I": {
"email": "adam_kane@adk.com",
}
}
# Fetching data from Dictionary# Fetching by key(like an index)
claus_watched = friends[’Claus’][’movies_watched’]
print(f’Claus Watched: {claus_watched}’)# .get allows us to define default values, in case key is missing
you_watched = friends[’I’].get(’movies_watched’, ["And Justice For All"])
print(f’You Watched: {you_watched}’)# Output
Claus Watched: ['The Shawshank Redemption', 'Harry Potter And the Goblet of Fire']
You Watched: ['And Justice For All']
Fetching values based on keys from a dictionary, like we did in the above example is known as key look up. The primary intent of this article is to provide a basic explanation of how Python dictionaries handle the lookup operation.
Now what if one has two friends with same name? Keys must be unique within a dictionary.
# Python supports different data types of keys
# In below example, the data-type used for key is 'tuple'.friends[("Claus", "claus_dual@gmail.com")] = {
"movies_watched": ["Star Wars", "Kung Fu Panda"]
}
print(friends)# Output - Cut out 'Steven' and 'Claus' to reduce space taken
{
'Bridget': {
'email': 'bridget_edu@hotmail.com',
'movies_watched': ['Inside Out', 'How to Train Your Dragon']
},
'I': {
'email': 'adam_kane@adk.com'
},
('Claus', 'claus_dual@gmail.com'): {
'movies_watched': ['Star Wars', 'Kung Fu Panda']
}
}
Cool, so it looks like Python not only allows different types of keys but also different types of keys within a same dictionary.
[I should mention that the examples here are just to showcase Python’s ability to handle different data types for keys. In real world applications, how to store data should be carefully planned out.]
But can we use any valid Python data type as a dictionary key?
# Let's try with list data-type as key
friends[["Steven", "steven_even@gmail.com"]] = {
"movies_watched": ["A Star Is Born"]
}Traceback (most recent call last): File "E:/Articles/Dictionary/code/medium.py", line 37, in <module>
friends[["Steven", "steven_even@gmail.com"]] = {"movies_watched": ["A Star Is Born"]}
TypeError: unhashable type: 'list'
Wait… What?!? Why?!? It looks like Python requires keys in a dictionary to be hashable (values can be any valid Python object and do not share this limitation)
Errr, hashable?
A very simplified definition of Hash: A fixed size integer which is computed using the data stored in the object. Hash values have the following properties:
- Same data will have same hash
- Any change in data can(ideally should) change the hash value as well
- Since data can be anything(actual content and size), it is possible for two different objects(containing different data) to have the same hash value. This is known as Hash Collision. A good hashing algorithm should try to minimize chances of collision.
- In Python, any object which has implementation for __hash__() is considered to be hashable.
What can be done?
In the last example, we had tried to use a list object as a Python key. Let’s create our own class(because in general we should never alter behaviors of in-built Python data types even if some hacks allow us to do that)
# Simple class
class Friend:
def __init__(self, name, email):
self.name = name
self.email = email
Now let’s try to use an instance of this class as a key.
first_user_1 = Friend(name="Steven", email="steven_adams@gmail.com")
sample_dictionary = {
first_user_1: {"movies_watched": ["Batman: Arkham Knight", "Avengers: Infinity War"]}
}
print(sample_dictionary[first_user_1]["movies_watched"])# Output
['Batman: Arkham Knight', 'Avengers: Infinity War']
Currently, we haven’t defined anything special(__hash__()) in our class which can inform Python whether it is hashable or not. All classes inherit from the base Python object class and that’s where the information is coming from. Let’s verify it
# In Python, all class by default inherits from 'object', which is the base built-in class. Can be verified by:
print(Friend.__bases__)
# dir - method tries to return a list of all attributes of the object
print(dir(Friend))# Output
(<class 'object'>,) -- Output of print(Friend.__bases__)['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] -- Output of print(dir(Friend))
In the list of attributes, __hash__ is present. Python checks for this attribute in a method and if it is not set, the object is considered to be unhashable.
Let’s try another lookup
Note: For future operations, do not modify the content of the dictionary and update code ‘in-place’ The snippets do not contain the dictionary definition and other statements to reduce size of this article
# Let's have another instance with same values for name and emailfirst_user_2 = Friend(name="Steven", email="steven_adams@gmail.com")
print(sample_dictionary[first_user_2]["movies_watched"])# Output
Traceback (most recent call last):
File "E:/Articles/Dictionary/code/medium.py", line 68, in <module>
print(sample_dictionary[first_user_2]["movies_watched"])
KeyError: <__main__.Friend object at 0x001DA3F0>
Why did it fail when we tried looking up using first_user_2? The default hash method that comes from object class computes the hash value based on the identity(id) of each object(Each object has its own unique ID in Python. However the whole concept of ‘each object’ is complex and beyond the scope of this article) Hence, even though first_user_1 and first_user_2 contain same data, dictionary lookup using first_user_2 will fail. Let’s check the IDs and hash values of the two objects
print(f"ID for first_user_1 {id(first_user_1)}")
# hash - Method calls __hash__ attribute of the object
print(f"Hash for first_user_1 {hash(first_user_1)}")
print(f"ID for first_user_1 {id(first_user_2)}")
print(f"Hash for first_user_1 {hash(first_user_2)}")# Output
ID for first_user_1 30385168
Hash for first_user_1 1899073ID for first_user_1 30385136
Hash for first_user_1 1899071
Defining custom hash
Custom hash method can be added by implementing __hash__()
# Updating the class with __hash__
class Friend:
def __init__(self, name, email):
self.name = name
self.email = email
def __hash__(self):
# Computing the hash value on name and email attribute
print("IN __hash__")
return hash((self.name, self.email))# Now let's try
first_user_2 = Friend(name="Steven", email="steven_adams@gmail.com")
print(sample_dictionary[first_user_2]["movies_watched"])# Output
IN __hash__
Traceback (most recent call last):
File "E:/Articles/Dictionary/code/medium.py", line 69, in <module>
IN __hash__
print(sample_dictionary[first_user_2]["movies_watched"])
KeyError: <__main__.Friend object at 0x0024A3F0>
The reason the above operation failed is because of an equality check performed by Python during dictionary key lookup in addition to hash check.
Why an equality check is performed?
By default, hash() method in Python truncates the return value of a custom __hash__() to 8 bytes (64 bit system) or 4 bytes (32 bit systems)(hash is lossy). This can result in hash collision, which means two different objects can return same hash value. To prevent accidentally matching a key, in addition to performing a hash check, at the time of key lookup an equality check is also done(because if two objects are equal, their hash values are same — Reverse is not always true as two objects may be different yet return same hash value)
Let’s define a custom equality check method
Defining custom equality check
Custom equality check can be added by implementing __eq__
# Updating the class with __eq__
class Friend:
def __init__(self, name, email):
self.name = name
self.email = email
def __hash__(self):
# Computing the hash value on name and email attribute
print("IN __hash__")
return hash((self.name, self.email))
def __eq__(self, other):
print("IN __eq__")
return self.name==other.name and self.email==other.email# Executing again
first_user_2 = Friend(name="Steven", email="steven_adams@gmail.com")
print(sample_dictionary[first_user_2]["movies_watched"])# Output
E:/Articles/Dictionary/code/medium.py"
IN __hash__
IN __hash__
IN __eq__
['Batman: Arkham Knight', 'Avengers: Infinity War']
In Python, if a class is implementing __hash__() it is recommended to implement __eq__() as well.
Also if a class implements a custom __eq__() method, its __hash__ automatically gets set to None(making it unhashable) unless a custom __hash()__ method has also been implemented.
After the hash check has succeeded, Python performs some identity check(actual object used as key while defining the dictionary vs the object used to lookup) before checking for equality. If the identity check succeeds, __eq__ does not get called. This is why the below operation succeeds:
# Update the class with intent to fail equality check
class Friend:
def __init__(self, name, email):
self.name = name
self.email = email
def __hash__(self):
# Computing the hash value on name and email attribute
print("IN __hash__")
return hash((self.name, self.email))
def __eq__(self, other):
print("IN __eq__")
return False
# return self.name==other.name and self.email==other.email
first_user_1 = Friend(name="Steven", email="steven_adams@gmail.com")
sample_dictionary = {
first_user_1: {"movies_watched": ["Batman: Arkham Knight", "Avengers: Infinity War"]}
}
print(f"first_user_1 {sample_dictionary[first_user_1]['movies_watched']}")# Output
IN __hash__
IN __hash__
first_user_1 ['Batman: Arkham Knight', 'Avengers: Infinity War']
In the above operation, __eq()__ did not get called, or else output would have contained ‘IN __eq__’
Now, let’s look at two more snippets just for fun
first_user_1 = Friend(name="Steven", email="steven_adams@gmail.com")
sample_dictionary = {
first_user_1: {"movies_watched": ["Batman: Arkham Knight", "Avengers: Infinity War"]}
}
first_user_1.name = "Susan"
print(f"first_user_1: {sample_dictionary[first_user_1]['movies_watched']}")# Output
Traceback (most recent call last):
File "E:/Articles/Dictionary/code/medium.py", line 63, in <module>
print(f"first_user_1: {sample_dictionary[first_user_1]['movies_watched']}")
IN __hash__
KeyError: <__main__.Friend object at 0x0041A410>
IN __hash__
Hash values and mutable objects are challenging. A hash value should not change for an object throughout its lifetime, and since in the custom definition of __hash__ the hash value has been computed based on name and email attribute, looking up the dictionary using first_user_1 fails since it has been modified since the time of dictionary creation[first_user_1.name was ‘Steven’].
class Friend:
def __init__(self, name, email):
self.name = name
self.email = email
def __hash__(self):
# Computing the hash value on name and email attribute
print("IN __hash__")
return hash((self.name, self.email))
def __eq__(self, other):
print("IN __eq__")
if(isinstance(other, Friend)):
return self.name==other.name and self.email==other.email
elif(isinstance(other, tuple)):
return (self.name, self.email) == other
else:
return False
first_user_1 = Friend(name="Steven", email="steven_adams@gmail.com")
sample_dictionary = {
first_user_1: {"movies_watched": ["Batman: Arkham Knight", "Avengers: Infinity War"]}
}tuple_user = ("Steven", "steven_adams@gmail.com")
print(f"Tuple User Works: f{sample_dictionary[tuple_user]['movies_watched']}")# Output
IN __hash__
IN __eq__
Tuple User Works: f['Batman: Arkham Knight', 'Avengers: Infinity War']
Even though the dictionary has been defined with Friend data type as key and for lookup tuple data type is being used the above operation succeeds due to how __hash__ and __eq__ have been implemented.
Interesting, isn’t it?
Well, that’s all folks!
Hope it has been informative and fun.
Please let me know of any improvement feedback or any mistakes which might be present in the above content 🙂
References
- https://realpython.com/python-dicts/
- https://www.journaldev.com/17357/python-hash-function
- https://docs.python.org/3/reference/datamodel.html#object.__hash__
- https://docs.python.org/3/reference/datamodel.html#object.__eq__
- https://hynek.me/articles/hashes-and-equality/
- https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/
- https://rszalski.github.io/magicmethods/#reflection