Some notes on Python dictionaries

A few pointers to better utilise python dictionaries

Kaushik Amar Das

Published in

Analytics Vidhya

6 min readOct 31, 2020

Introduction

Python dictionaries are flexible containers that map a key to a value. Knowing how to use dictionaries is an essential skill for the repertoire of any pythonist.

This blog covers a few notes on Python dictionaries which have helped me improve my dict game. Much of it might be already familiar to you if are an (>=) intermediate level python programmer. But I still hope that you will enjoy this article and perhaps take away a thing or two.

I have tried to make this blog beginner-friendly, but if you are not very familiar with dict, you might find a refresher useful.

Let’s begin.

Note 1: A basic example of using a dictionary

For a basic demonstration of how dict works, let’s calculate the frequency of items in a list. This is a pretty obvious use-case for a dict.
We will start by implementing this in the most basic way possible and improve our code as we proceed. Later we will use a specialised container to do this.

Note 2: `setdefault` elegantly handles missing keys

We can avoid the use of the if-else clause in the above code and handle missing keys using the setdefault method.
If the item/key is in the dictionary, setdefault returns its value. Otherwise, it inserts the item with the given value and returns it.
The explaination in the documentation of setdefault is pretty straight forward and goes as follows:

setdefault(key[,default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to `None`.

This not only decreases the number of lines of code but also makes the code more readable and pythonic.
Let’s uses setdefault to improve the example in Note 1.

Here is another example of `setdefault` to make it clearer, since this is an important method.

The code below separates all the odd and even numbers from a list and put it into its own list/bucket.
One bucket contains all the odd numbers while the other contains all the odd numbers. We will use num%2 ( a hash function ) to create the key, since odd_num%2 == 1 and even_num%2 == 0 .

Of course, you can argue that in the above code, we can simply initialize buckets as bucket={'odd':[],'even':[]}. But think of non-trivial use-cases where you won’t know the keys beforehand, for example, reading a .csv file with counties and their cities, each row is <country_name>,<city_name> and you need to group all the countries and their cities; a complicated hash function with an arbitrary number of buckets, etc. You can make the code even shorter if you want.

for num in num_list:
    buckets.setdefault(num%2,[]).append(num)

For the sake of comparison, here are two (intentionally ugly) alternatives for the same example without using setdefault.

Note 3: Use `collections.Counter()` for counting objects

Going back to our previous example of calculating frequency, it’s such a recurring task that python has a built-in for it, called Counter , in the collections module. Counter comes with useful methods such as most_common(n) to quickly find out the most frequent n items and is very similar to a dict (docs).

Note 4: Dictionary comprehensions

Comprehensions are one of the most useful tools in Python and of course, it is supported by dictionaries as well. The syntax is mostly the same as that
of list comprehensions, with the difference being the use of {..} instead of (..) and requires you to define a key: value pair.
It should be pretty apparent from the code examples below.

Note that dictionary comprehensions should not be used for any side-effects, such as incrementing a variable or printing, etc.

Note 5: Note 5: Insertion order in python dictionaries and `OrderedDict`

As of Python 3.7+, dictionaries maintain insertion order. However, it is not recommended to rely upon it. Many popular libraries (and programmers) assume that the ordering in dict doesn’t matter, as it most often doesn’t.

If you want to perserve insertion order, you should instead use OrderedDict (docs) which remembers the order of items inserted by default. In addition to clearlying conveying your intensions, it also has the added benifit of not having to worry too much about backward compatibility.

If you wish to learn more about this, I highly recommend (this blog) post by Greg Gandenberger.

Note 6: Dict keys need to hashable and things that are as hashable.

For an object to work as a key in a dictionary, it needs to be hashable. Examples of hashable objects are int, str , float, etc. Specifically, it needs to meet the three following requirements.

It should support the hash() function via a __hash__() dunder method whose value never changes over the lifetime of the object.
It supports equality comparison via __eq__() method.
If a == b is True then hash(a) == hash(b) must also be True .

It is for these same reasons, a tuple can be a key in dict
while a list can’t. (ref)

On this note, User-defined types are hashable by default. This is because their hash value is their id() and they all compare not equal.
One note of caution is that, custom implementations of __hash__() ,
__eq__() should only take into account those object attributes that never change during the lifetime of the object. (further information)

Note 7: Dictionaries are fast but trade space for time

Internally, dict uses hash tables. And by design these hash tables are sparse, meaning these are not very space-efficient. For a large number of records, it might be more space-efficient to store them as in compact objects such as tuples.

Even though dict has a significant memory overhead, as long as it fits in memory, allows fast access. (ref)

Note 8: Use `sets` instead if uniqueness is all you need

It is often to the case to find all unique items from a collection. It might be tempting to use a dict with a dummy value since all keys in a dict are unique by default.

In such a scenario, it is better to use a set instead. Python set guarantees uniqueness. It also has nice properties similar to sets in mathematics like union and intersection. Similar to dict, elements in a set it must be hashable.

But if the case requires a hashable set , you will have to use frozenset instead since a set is not hashable. frozenset is a hashable version of set, so it can be put inside a set.

Since this work is on dictionaries, I will leave you with the link to the docs to learn more about sets.

Summary

Basic demonstration of how dict works by calculating the frequency of items in a list.
Missing keys can be elegantly handled by setdefault.
collections.Counter is a specialized container for counting hashable objects
Comprehensions are supported for creating dictionaries.
Even though dict preserves order, but it’s not something that should be relied upon. Better use OrderedDict for the same.
Dictionary keys need to be hashable. User-defined objects are hashable by default.
Dictionaries are fast but are space inefficient since it uses sparse hash tables.
Use sets instead of dictionaries when you only need to find unique values.

Conclusion

This blog covered a few notes on the use of Python dictionaries. I hope this has been a knowledgeable and enjoyable read for you. If you spot any mistakes please leave a comment so that I can correct it. Any feedback and suggestion are also welcome.

You can find a Jupyter notebook version of this blog in my Github.

Thank you for reading.