Some notes on Python dictionaries
A few pointers to better utilise python dictionaries
Introduction
Python dictionaries are flexible containers that map a key to a value. Knowing how to use dictionaries is an essential skill for the repertoire of any pythonist.
This blog covers a few notes on Python dictionaries which have helped me improve my dict
game. Much of it might be already familiar to you if are an (>=) intermediate level python programmer. But I still hope that you will enjoy this article and perhaps take away a thing or two.
I have tried to make this blog beginner-friendly, but if you are not very familiar with dict
, you might find a refresher useful.
Let’s begin.
Note 1: A basic example of using a dictionary
For a basic demonstration of how dict
works, let’s calculate the frequency of items in a list. This is a pretty obvious use-case for a dict
.
We will start by implementing this in the most basic way possible and improve our code as we proceed. Later we will use a specialised container to do this.
Note 2: `setdefault` elegantly handles missing keys
We can avoid the use of the if-else
clause in the above code and handle missing keys using the setdefault
method.
If the item/key is in the dictionary, setdefault
returns its value. Otherwise, it inserts the item with the given value and returns it.
The explaination in the documentation of setdefault
is pretty straight forward and goes as follows:
setdefault(key[,default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to `None`.
This not only decreases the number of lines of code but also makes the code more readable and pythonic.
Let’s uses setdefault
to improve the example in Note 1.
Here is another example of `setdefault` to make it clearer, since this is an important method.
The code below separates all the odd and even numbers from a list and put it into its own list/bucket.
One bucket contains all the odd numbers while the other contains all the odd numbers. We will use num%2
( a hash function ) to create the key, since odd_num%2 == 1
and even_num%2 == 0
.
Of course, you can argue that in the above code, we can simply initialize buckets
as bucket={'odd':[],'even':[]}
. But think of non-trivial use-cases where you won’t know the keys beforehand, for example, reading a .csv
file with counties and their cities, each row is <country_name>,<city_name>
and you need to group all the countries and their cities; a complicated hash function with an arbitrary number of buckets, etc. You can make the code even shorter if you want.
for num in num_list:
buckets.setdefault(num%2,[]).append(num)
For the sake of comparison, here are two (intentionally ugly) alternatives for the same example without using setdefault
.
Note 3: Use `collections.Counter()` for counting objects
Going back to our previous example of calculating frequency, it’s such a recurring task that python has a built-in for it, called Counter
, in the collections
module. Counter
comes with useful methods such as most_common(n)
to quickly find out the most frequent n items and is very similar to a dict
(docs).
Note 4: Dictionary comprehensions
Comprehensions are one of the most useful tools in Python and of course, it is supported by dictionaries as well. The syntax is mostly the same as that
of list comprehensions, with the difference being the use of {..}
instead of (..)
and requires you to define a key: value
pair.
It should be pretty apparent from the code examples below.
Note that dictionary comprehensions should not be used for any side-effects, such as incrementing a variable or printing, etc.
Note 5: Note 5: Insertion order in python dictionaries and `OrderedDict`
As of Python 3.7+, dictionaries maintain insertion order. However, it is not recommended to rely upon it. Many popular libraries (and programmers) assume that the ordering in dict
doesn’t matter, as it most often doesn’t.
If you want to perserve insertion order, you should instead use OrderedDict
(docs) which remembers the order of items inserted by default. In addition to clearlying conveying your intensions, it also has the added benifit of not having to worry too much about backward compatibility.
If you wish to learn more about this, I highly recommend (this blog) post by Greg Gandenberger.
Note 6: Dict keys need to hashable and things that are as hashable.
For an object to work as a key in a dictionary, it needs to be hashable. Examples of hashable objects are int
, str
, float
, etc. Specifically, it needs to meet the three following requirements.
- It should support the
hash()
function via a__hash__()
dunder method whose value never changes over the lifetime of the object. - It supports equality comparison via
__eq__()
method. - If
a == b
isTrue
thenhash(a) == hash(b)
must also beTrue
.
It is for these same reasons, a tuple
can be a key in dict
while a list
can’t. (ref)
On this note, User-defined types are hashable by default. This is because their hash value is their id()
and they all compare not equal.
One note of caution is that, custom implementations of __hash__()
,__eq__()
should only take into account those object attributes that never change during the lifetime of the object. (further information)
Note 7: Dictionaries are fast but trade space for time
Internally, dict
uses hash tables. And by design these hash tables are sparse, meaning these are not very space-efficient. For a large number of records, it might be more space-efficient to store them as in compact objects such as tuples
.
Even though dict
has a significant memory overhead, as long as it fits in memory, allows fast access. (ref)
Note 8: Use `sets` instead if uniqueness is all you need
It is often to the case to find all unique items from a collection. It might be tempting to use a dict
with a dummy value since all keys in a dict are unique by default.
In such a scenario, it is better to use a set
instead. Python set
guarantees uniqueness. It also has nice properties similar to sets in mathematics like union and intersection. Similar to dict
, elements in a set
it must be hashable.
But if the case requires a hashable set
, you will have to use frozenset
instead since a set
is not hashable. frozenset
is a hashable version of set
, so it can be put inside a set
.
Since this work is on dictionaries, I will leave you with the link to the docs to learn more about sets.
Summary
- Basic demonstration of how
dict
works by calculating the frequency of items in a list. - Missing keys can be elegantly handled by
setdefault
. collections.Counter
is a specialized container for counting hashable objects- Comprehensions are supported for creating dictionaries.
- Even though
dict
preserves order, but it’s not something that should be relied upon. Better useOrderedDict
for the same. - Dictionary keys need to be hashable. User-defined objects are hashable by default.
- Dictionaries are fast but are space inefficient since it uses sparse hash tables.
- Use sets instead of dictionaries when you only need to find unique values.
Conclusion
This blog covered a few notes on the use of Python dictionaries. I hope this has been a knowledgeable and enjoyable read for you. If you spot any mistakes please leave a comment so that I can correct it. Any feedback and suggestion are also welcome.
You can find a Jupyter notebook version of this blog in my Github.
Thank you for reading.
Further reading
I highly recommend the book “Fluent Python” by Luciano Ramalho, which covers all the topics in this blog in great depth.
This blog is also largely inspired by his wonderful book.