ADVANCED PYTHON PROGRAMMING

The Ones That Got Away

This time, we’ll talk about object-oriented miscellanea that didn’t fit anywhere else: hashing, subclassing native types, and exceptions.

Dan Gittik
13 min readApr 24, 2020

--

We’re nearing the end of our journey: from stuff as basic as scopes, conditions and loops, through objects, classes and metaclasses — we’re ready to talk about our final topic: modules and packages. Before we do, I’d like to take a moment to address all the object-oriented stuff that I skipped, whether because it didn’t fit the narrative, or I just didn’t want to get bogged down by (even more) details.

Hash, Little Baby

There’s a very important special method I skipped: __hash__. It’s pretty basic on one hand, and a bit complicated on the other, so it didn’t fit in any of the sections. To understand its background, consider this:

Everything’s dandy, until one day you decide to add an __eq__ method to A:

That’s odd. It stems from the way Python implements dictionaries, in a data structure known as a hash table. How would you implement a dictionary in a low-level language like C? You could store all the key-value pairs in an array, but then looking for a particular key will require iterating over all of them, and will take a long time for large dictionaries. A hash table optimizes this lookup by allocating a large array, and when the dictionary is indexed by some key, reducing that key to an index. This way, every object can be mapped to some number i, and the corresponding value can be stored and retrieved from the i % len(array)’s slot instantly.

But wait — what if there’s a collision, when two different keys get reduced to the same index? I guess we’re gonna have lists of key-value pairs after all, and iterate over them linearly; except, we’d only do that for collisions, which don’t happen that often, so we’ll still reap most of the benefits of our optimization. Incidentally, that’s exactly what Python does. When you define an object, its default __hash__ implementation, which can be revealed using the built-in hash function, reduces the object by shuffling some bits of its ID; and since IDs are unique, and the shuffling does some clever math, a collision is pretty improbable.

But then we up and override __eq__. That opaque error message was Python’s gentle way of telling us that if we choose to override default behavior, we need to be thorough, and implement __hash__, too. After all, if two objects are equal, they should be reduced to the same index and overwrite each other, much like numbers and strings do; and if we have custom comparison logic based on the object’s state, how can Python derive an appropriate hash on its own?

Implementing a hashing algorithm might sound intimidating: are we supposed to do maths? Java programmers will probably be familiar with the 7/31 formula, which is a popular “good enough” implementation for Java’s hashCode. The idea is to start from 7, and for every field that matters, add it to the result so far, multiplied by 31. This yields a unique enough value on one hand—and assures that objects with the same fields have the same hash on the other:

But that’s a lot of hassle; why don’t we use Python’s built-in hash function instead? We can just pack “all the fields that matter” in a tuple and hand it over, like so:

Looks pretty random to me. And pretty short.

Some More Funny Methods

Another method I haven’t mentioned is __format__. We talked about __str__, which returns a human-readable representation, and __repr__, which provides more context for developers. However, an object can support a more “parameterized” formatting; you know, like what we use to pad floats and strings, or format datetimes:

Whatever’s after the : gets passed to __format__’s argument, fmt. So:

There’s also a nuance about iteration I haven’t told you about. Remember how __iter__ should return an iterator with a __next__ method? Well, if you’d like to be able to iterate over your object in reverse, you’d have to implement it yourself in __reversed__, since there’s no __prev__ method, and no default way for Python to play an iterator in reverse; primarily because an iterator can be generative, and not really know when it ends until you play it all out. This is not enough:

But this is:

Another funny method we haven’t discussed is __missing__. You can add it to dictionaries, and if a key is missing, your method will be invoked to provide some default value for it:

It’s a pretty silly method, since it’s easy enough to overwrite __getitem__:

But for some reason they decided to add it; maybe it’s an optimization. In any case, the interesting part about this example is not the method — but the fact that I’ve subclassed a native type. I’ve actually done it before, when we were playing with metaclasses and passing our own dictionaries to __prepare__; but it’s alright if you were too overwhelmed to notice. Which brings us to our next topic:

Imperialism

Python actually lets us subclass almost all native types, and to great acclaim. We can make dictionaries work with dot notation, like so:

Or, let’s say we’re writing a Log class, which filters messages based on their levels (DEBUG, INFO, WARN, ERROR, etc.). These levels are integers; but it’d be nice to have their name attached. If we do this:

It’s alright, except now we have to implement all the comparison operators, because our code is littered with if level > INFOs and such. Would if we could:

So, an integer for all intents and purposes — but one with a name. It’s a bit unfortunate we have to implement it in __new__, but if we’d only change the __init__, Python would default to int’s __new__, which only accepts one argument.

Similarly, let’s say we’re implementing a Path class with a bunch of methods, like read and write—but we’d like it to work with os.path.dirname and the like, all of which only accept strings. We could do this:

(As an aside, we shouldn’t; while os.path is pretty annoying to work with, pathlib is an excellent module, and most of the standard library works well with it, so there’s no need to reinvent the wheel.)

The Exception to the Rule

One last thing I’d like to talk about before we proceed is exceptions—the object-oriented error code. It’s a pretty interesting control flow mechanism that, generally speaking, has this structure:

Except exceptions propagate, so an object can declare them as part of its interface: I return so and so, but if something goes wrong, I raise this and this error — catch it and handle it, if you want. It adds another degree of freedom to the language, even though some people would argue it also adds a lot of unpredictability (not to speak of overhead) to the code.

To those people, I’d like to tell a story. Once, I really wanted a bike. Every day, I’d pray for God to give me a bike, but nothing happened. Eventually, I realized it doesn’t work like that— so I stole a bike, and prayed for God to forgive me. This story isn’t actually true, but it exemplifies an idiom that’s pretty popular with the Python community: it’s better to ask forgiveness than to ask permission — and I have to say, as an Israeli, I can appreciate the merits of chutzpah, so I tend to agree. It means that instead of writing this:

We should write this:

It’s more cumbersome, true; but technically, it’s also more correct. In the first case, we check that the key is present in the dictionary before we access it. That’s lovely — as long as our code is single-threaded. Otherwise, some thread might snatch it right under our nose, a moment after the if key in d but a moment before the d[key]; so effectively, we’d end up with a KeyError anyway, and we might as well drop that “ask for permission” altogether.

Clauses Except Except

Exceptions in Python are actually much more powerful that just try and except. You can add a finally clause, which happens regardless of whether an exception was raised or not:

And you can even drop the except:

In which case the exception will be propagated — but not before the finally clause is executed. Great way to handle stuff like closing files, releasing locks, or implementing context managers that do it for you ;)

Similarly, exceptions support an else clause, which happens only if no exception was raised. Pretty poor naming, if you ask me—noerror would’ve been better, just like nobreak would’ve been better for loops. Anyway:

But let’s talk a bit more about the except clause, since at the end of the day, it’s what it’s all about. We can catch one exception class, or several:

And we can bind it (or them) to names that will be available in the body of the except clause, like so:

For multiple classes, we’d have to add parenthesis:

Alternatively, we could drop the exception classes altogether, and just do:

Which catches any and all exceptions. Generally, this is frowned upon, because stuff like KeyboardInterrupt (raised by a SIGINT signal, which is sent by pressing CTRL+C), SystemExit (the exception raised by exit() to terminate the program), and even SyntaxError, are all exceptions—and we probably didn’t mean to ignore missing dependencies, program abortion, and invalid code. What we actually meant was:

Which catches any Exception; that is, regular errors (like NameError, TypeError, ValueError or RuntimeError), all of which subclass it. In any case, once inside the except clause, we can either handle that error, raise a different one, or re-raise the same. To re-raise, simply:

And Python will know what you mean. This let’s you inject some code, like logging the exception, without really interfering with its propagation.

An Exceptional Family

Like most classes, exceptions can be subclassed — and some projects define their own exception hierarchy, because they want their users to be able to catch particular errors, while letting generic stuff (like a ZeroDivisionError) propagate farther, as they probably indicate a more fundamental flaw in the system. An examples of this is SQLAlchemy, which provides all sorts of custom exceptions for different errors, like NoSuchTableError for when a certain table doesn’t exist in the database, or IntegrityError for when you’re trying to add a value that should be unique, but already exists.

Yet other frameworks provide custom exceptions for you to throw, rather than catch: in Werkzeug, the HTTP engine behind Flask, you can raise a NotFound exception, which will result in the server returning an HTTP response with 404 NOT FOUND; or a BadRequest exception, which returns 400 BAD REQUEST.

In either case, you should think twice before implementing your own exception hierarchy: it adds cognitive load, and for most cases, a standard error is enough. Just stick to ValueError for bad arguments, TypeError for improper usage, KeyError for missing keys, AttributeError for missing attributes, and when in doubt—a RuntimeError with an informative error message.

These distinctions might not sound very important: why not always raise an Exception with a description? As it turns out, the type of the exception you raise actually matters—a lot. Take our über-dictionary from before, for example:

This is all well and good, until for some reason, someone tries to access your dictionary with getattr: having implemented __getattr__, it should support it, right? Well, here’s how getattr works:

As you can see, we can provide a default value instead of that nasty AttributeError. As for our dictionary:

Seems to work. But wait…

Which is surprising, since we’ve specifically provided a default value. The key to that mystery is the error raised: it’s not an AttributeError, which is what getattr expected and replaces, but a KeyError, raised because the key wasn’t in the dictionary, and __getattr__ simply delegated the work to __getitem__. Let’s fix that:

And now:

It works! Except when an error is raised, it’s much uglier:

Whoa. That happens because Python chains exception — so whenever an exception originates in an except clause, it keeps a pointer to the previous exception, which is currently being handled, and Python lays out this entire history. You can chain exceptions yourself, by using raise with a from clause—and in this case, we’ll do just that to unchain them:

The exception is explicitly raised from nothing, so we get a clean slate.

Another tricky situation to look out for is AttributeErrors in propertys. Check out this code:

This class has a buggy property, and a silly __getattr__, which returns 1 for any dynamic attribute starting with _, and raises the customary AttributeError otherwise. But then, this happens:

This is really weird, because I’d expect the attribute error to be about x, not about p; and when debugging real code that’s not so obviously broken, trying to figure it out can drive a person mad. What actually happens is that Python tries to resolve p, which tries to resolve x, which raises an AttributeError, as you might expect; but that’s the default behavior for any non-existing attribute—so Python decides that since p is clearly missing, it should call __getattr__('p'), and it’s actually that scoundrel that raises the error. You can validate it with a quick print:

So yeah — know thy exceptions.

Conclusion

Now we really covered Python’s data model. Well, not really: some methods, like __aiter__ and __aenter__, we’ll discuss when we learn about asynchronous programming; and some, like __slots__ and __length_hint__, when we learn about performance. My point is, I knight thee a champion of the object-oriented arts—and on we move to our next challenge: modules and packages!

The Advanced Python Programming series includes the following articles:

  1. A Value by Any Other Name
  2. To Be, or Not to Be
  3. Loopin’ Around
  4. Functions at Last
  5. To Functions, and Beyond!
  6. Function Internals 1
  7. Function Internals 2
  8. Next Generation
  9. Objects — Objects Everywhere
  10. Objects Incarnate
  11. Meddling with Primal Forces
  12. Descriptors Aplenty
  13. Death and Taxes
  14. Metaphysics
  15. The Ones that Got Away
  16. International Trade

--

--

Dan Gittik

Lecturer at Tel Aviv university. Having worked in Military Intelligence, Google and Magic Leap, I’m passionate about the intersection of theory and practice.