All We Need to Know About Various Dictionaries in Python

Tour of various Dictionaries in Python

--

In Python, dictionaries (dicts) are a central data structure. Dicts store an arbitrary number of objects, each identified by a unique dictionary key.

Dictionaries are also often called maps, hashmaps, lookup tables, or associative arrays. They allow for the efficient lookup, insertions and deletion of any object associated with a given key.

Dictionaries are one of the most frequently used and most important data structures in computer science. So, how does Python handle dictionaries? Let’s take a tour of the dictionary implementations in core Python and the Python standard library.

dict - Our Go-To Dictionary

Because of their importance, Python features a robust dictionary implementation that’s built directly into the core language: the dict data type.

>>> ages = {
... 'Sarah': 26,
... 'Roark': 25,
... 'Wilson': 30
... }
>>> ages
{'Sarah': 26, 'Roark': 25, 'Wilson': 30}
>>> squares = {x: x * x for x in range(5)}
>>> squares
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

There are some constraints on which objects can be used as valid keys. Python’s dictionaries are indexed by keys that can be of any hashable type:

From Python’s Glossary, An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id(). (Refer here for working example)

Immutable types likes strings and numbers are hashtable and work well as dictionary keys. We can also use tuple objects as dictionary keys, as long as they contain only hashable type themselves.

Dictionaries are highly optimized and are based on a well tested and finely tuned hash table implementation that provides the performance characteristics we’d expect: O(1) time complexity for lookup, insert, update and delete operations in the average case.

Python’s built-in dictionary implementation will do everything we need and there’s a little reason not to use it.

Besides plain dict objects, Python’s standard library includes a number of specialised dictionary implementations which add some convenience features on top of built in dictionary class. Let’s take a look at them.

OrderedDict — Remember the Insertion Order of Keys

Python includes a specialized dict subclass that remembers the insertion order of keys added to it: collections.OrderedDict

Note: standard dict preserve the insertion order of keys in CPython 3.6 and above. However if key order is important for our algorithm to work, it’s best to communicate this clearly by explictly using OrderedDict class.

OrderedDict must be imported from collections module in the standard library.

>>> import collections
>>> d = collections.OrderedDict(one=1, two=2, three=3)
>>> d
OrderedDict([('one', 1), ('two', 2), ('three', 3)])
>>> d['four'] = . 4
>>> d
odict_keys(['one', 'two', 'three', 'four'])

defaultdict — Returns Default Values for Missing Keys

The defaultdict class is another dictionary subclass that accepts a callable in its constructor whose return value will be used if a requested key cannot be found.

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['names'].append('Sarah')
>>> d['names'].append('Roark')
>>> d['names'].append('Wilson')
>>> d['names']
['Sarah', 'Roark', 'Wilson']

ChainMap — Search Multiple Dictionaries as a Single Mapping

The collections.ChainMap data structure groups multiple dictionaries into a single mapping. Lookups search the underlying mapping one by one until a key is found. Insertions, updates and deletions only affect the first mapping added to the chain.

>>> from collections import ChainMap
>>> d1 = {'one': 1, 'two': 2}
>>> d2 = {'three': 3, 'four': 4}
>>> chain = ChainMap(d1, d2)
>>> chain
ChainMap({'one': 1, 'two': 2}, {'three': 3, 'four': 4})

ChainMap searches each collection in the chain from left to right until if finds the key (or fails):

>>> chain['three']
3
>>> chain['one']
1
>>> chain['five']
KeyError: 'five'

MappingProxyType — Read Only Dictionaries

MappingProxyType is a wrapper around a standard dictionary that provides a read-only view into the wrapped dictionary data.

For example, this can be helpful if we’d like to return a dictionary carrying internal state from a class or a module, while discouraging write access to this object. Using MappingProxyType allows us to put these restrictions in place without first having to create a full copy of dictionary.

>>> from types import MappingProxyType
>>> writable = {'Sarah': 26, 'Roark': 25}
>>> read_only = MappingProxyType(writable)
>>> read_only['Sarah']
26
>>> read_only['Wilson'] = 30
TypeError: 'mappingproxy' object does not support item assignment
>>> writable['Wilson'] = 30
>>> read_only
mappingproxy({'Sarah': 26, 'Roark': 25})

Conclusions:

  • Dictionaries are the central data structure in Python
  • The built-in dict type will be good enough most of the time
  • Specialised implementations, like read-only or ordered dicts, are available in the Python standard library.

--

--

Rachit Tayal
Python Features

Sports Enthusiast | Senior Deep Learning Engineer. Python Blogger @ medium. Background in Machine Learning & Python. Linux and Vim Fan