ADVANCED PYTHON PROGRAMMING
Objects Incarnate
This time, we go over many more behaviors that objects have to offer, and see just how powerful Python can get.
Last time, we saw that objects comprise of an ID, a type and a value—and that the type is by far the most interesting. We’ve covered display, equality and comparison, so it’s time for some more exciting tricks.
Truth or Dare
Any object can have a boolean value—its state being either “on” or “off”. To make this value easily accessible—for example, to use the object as-is in an if
statement—you’d have to implement the __bool__
method (unfortunately named __nonzero__
in Python 2):
>>> class A:
... def __init__(self, x):
... self.x = x
... def __bool__(self):
... return self.x > 0
>>> a = A(1)
>>> if a:
... print(f'{a} is positive')
<A object at 0x...> is positive
>>> a = A(0)
>>> if a:
... print(f'{a} is positive')
# Nothing.
This is important not only because it’s neat, but because of an important design guideline, called the Principle of Least Astonishment. The idea behind it is that users expect their tools, which includes your code, to behave in a predictable way; and while meeting those expectations is good design, surprising the user is frowned upon. Consider this first-in, first-out (FIFO) queue:
class Queue:
def __init__(self):
self.items = []
def add(self, item):
self.items.append(item)
def get(self):
if not self.items:
raise Exception('queue is empty')
return self.items.pop(0)
The queue actually uses a list (which is a last-in, first-out [LIFO] stack), but pops items off of its other end, making it handy in some situations. But then:
>>> q = Queue()
>>> q.add(1)
>>> if q:
... print(q.get())
1
>>> if q:
... print(q.get())
Traceback (most recent call last):
...
Exception: queue is empty
By default, every object evaluates to True
—and since we didn’t tell it any different, Python assumed that this is the case for our object, too. But containers such as lists, dictionaries, or, indeed, our Queue
, are expected to evaluate to False
when they’re empty, which led the user to make the mistake of reading from an empty queue. If only we’d…
class Queue:
... # Same as before
def __bool__(self):
return len(self.items) > 0
# Or return bool(self.items)
# Or even just return self.items
Then the user would be least astonished:
>>> q = Queue()
>>> q.add(1)
>>> if q:
... print(q.get())
1
>>> if q:
... print(q.get())
# Nothing.
Scott Meyers has a brilliant talk on the Most Important Principle of Design, where he phrases it thusly: “make interfaces that are easy to use correctly, and hard to use incorrectly”. While the first part is pretty straight-forward—I mean, of course anyone would try to make their interface nice and easy—the second one is very thought-provoking. Scott elaborates, and makes the revolutionary statement that users are not stupid: if they’re trying to get your code to work, they’re probably (somewhat) smart or capable, (somewhat) motivated, and are willing to read (some) documentation. Nobody goes to work thinking, “today, I’m going to do a terrible job”—so if they still mess it up, it’s your fault as much as theirs.
Don’t Call Us; We’ll Call You
Then there are callables—functions, first and foremost, but any object can be invoked if we so desire:
>>> class A:
... def __call__(self, x, y):
... return x + y
>>> a = A()
>>> a(1, 2)
3
There’s not much to say about callable objects in terms of their callability—all the tricks we learnt for functions still apply. However, they’re particularly interesting when combined with decorators—and this time, we’ll start with 2nd order decorators first. Remember that brain spasm, when you first saw this:
>>> def multiply(m):
... def decorator(f):
... def wrapper(*args, **kwargs):
... return m * f(*args, **kwargs)
... return wrapper
... return decorator>>> @multiply(2)
... def inc(x):
... return x + 1>>> inc(1)
4 # (1 + 1) * 2
Functions can get pretty difficult to read with all that nesting. Luckily, classes are much better at managing state while remaining flat, and they have two invocation points: their constructor (__init__
), and their call (__call__
):
>>> class Multiply:
... def __init__(self, m):
... self.m = m
... def __call__(self, f):
... def wrapper(*args, **kwargs):
... return self.m * f(*args, **kwargs)
... return wrapper>>> @Multiply(2)
... def inc(x):
... return x + 1>>> inc(1)
4
This shouldn’t come as a surprise: remember the decoration line is an expression, so we might as well have defined double = Multiply(2)
to be a callable object, and then invoked it to decorate inc
.
Switching Babies
More interesting are 1st order decorators. In the previous case, our first invocation (Multiply(2)
) constructed our decorator, which was then invoked on the function (__call__(f)
) to produce a wrapper. In this case, our first invocation is going to be the decorating—so __init__
would have to accept a single argument, f
. As you remember, the way decorators work:
@double
def inc(x):
return x + 1
Is actually syntactic sugar for:
def inc(x):
return x + 1
inc = double(inc)
So if we decorate something with a class, it’d invoke its constructor on it, and replace it with a new instance. That instance better be callable and delegate stuff to the original f
—that’s what decorators are for, after all:
>>> class Double:
... def __init__(self, f):
... self.f = f
... def __call__(self, *args, **kwargs):
... return 2 * self.f(*args, **kwargs)>>> @Double
... def inc(x):
... return x + 1>>> inc(1)
4
That looks like way more work than a standard 1st order decorator—and it is. But in replacing the function with a callable object, whose __call__
is effectively what we previously called wrapper
, we’ve gained all the other benefits of objects: namely, attributes and methods. Remember the memoization decorator we developed for speeding up Fibonacci? How about:
class Memoized:
def __init__(self, f):
self.f = f
self.cache = {}
def __call__(self, *args, **kwargs):
token = args + tuple(kwargs.dict())
if token not in self.cache:
self.cache[token] = self.f(*args, **kwargs)
return self.cache[token]
This way, not only do we get memoization:
>>> @Memoized
... def fib(n):
... return n if n < 2 else fib(n-1) + fib(n-2)>>> fib(10)
55 # Works instantly!
But we also get access to the cache, and can access it and clear it:
>>> fib.cache
{0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8, 7: 13, 8: 21, 9: 34, 10: 55}
>>> fib.cache.clear()
All that is possible because fib
is not actually a function—it’s a Memoized
object, whose f
points to the original function, which does all the work when called. See for yourself:
>>> fib
<Memoized object at 0x...>
As a side note, such decorator classes should still use functools.wraps
to imitate the original function (e.g. by preserving its __name__
and __doc__
). It looks a bit weird, but it works:
import functoolsclass Decorator:
def __init__(self, f):
self.f = f
functools.wraps(f)(self)
def __call__(self, *args, **kwargs):
return self.f(*args, **kwargs)
Arithmetics
OK, this part is boring—so I’m going to breeze through it. Let’s jump straight to an example:
>>> class A:
... def __init__(self, x):
... self.x = x
... def __repr__(self):
... return f'{self.__class__.__name__}({self.x!r})'
... def __add__(self, other):
... if not isinstance(other, A):
... return NotImplemented
... return self.__class__(self.x + other.x)>>> a1 = A(1)
>>> a2 = A(2)
>>> a1 + a2
A(3)
Note that we’ve applied all our previous lessons: the __repr__
uses the dynamically resolved __class__
, as does the __add__
when creating a new instance for the sum; and if the other argument is not an A
, just we just admit that we don’t know with NotImplemented
. There are quite a few similar operators:
class A:
def __add__(self, other) # x + y
def __sub__(self, other) # x - y
def __mul__(self, other) # x * y
def __truediv__(self, other) # x / y
def __floordiv__(self, other) # x // y
def __mod__(self, other) # x % y
def __matmul__(self, other) # x @ y
def __divmod__(self, other) # divmod(x, y)
def __pow__(self, other, m=None) # x ** y, pow(x, y[, m])
def __lshift__(self, other) # x << y
def __rshift__(self, other) # x >> y
def __and__(self, other) # x & y
def __or__(self, other) # x | y
def __xor__(self, other) # x ^ y
def __invert__(self) # ~x
def __neg__(self) # -x
def __pos__(self) # +x
def __abs__(self) # abs(x)
Some things to note:
- There’s no
__div__
; there used to be, in Python 2 ,but it proved too confusing. In Python, there’s only__truediv__
, which does true division, like5 / 2 == 2.5
, and__floordiv__
, which does integer division (rounding any fractions down), like5 // 2 == 2.0
. - There’s an operator for matrix multiplication,
@
. It’s the new kid on the block, and it’s kinda weird, but it’s there for extra syntax. - Some operators don’t have “infix notation” with a special symbol; they rather define the behavior of built-in functions like
divmod
andpow
. - Specifically
pow
can take up to three arguments—the third one being a modulo, a mathy thing used to accelerate the computation under certain conditions. There’s no way to convey it with the infixx ** y
, but your signature should support it nevertheless. - The
__and__
and__or__
methods don’t actually correspond to the keywordsand
andor
, but to the bitwise operations&
and|
. - Some operators are binary, working on both
self
andother
, while others are unary, working only onself
. The obvious ones are inversion (~x
) and negation (-x
), but emphasizing a number is positive (+x
) or computing its absolute value with the built-in functionabs
are also a thing.
Then, there are the r-operators. You know how our proper __add__
implementation back there returned NotImplemented
for unfamiliar types? In that case, Python will go ahead and ask the other party—but what method of the other party is it supposed to call? Addition may be commutative, meaning x + y
and y + x
are the same; but subtraction isn’t: x — y
and y — x
are usually quite different. The answer is, for __add__
, Python will look for an __radd__
, and for __sub__
, it’d look for __rsub__
. Again:
class A:
def __radd__(self, other) # y + x
def __rsub__(self, other) # y - x
def __rmul__(self, other) # y * x
def __rtruediv__(self, other) # y / x
def __rfloordiv__(self, other) # y // x
def __rmod__(self, other) # y % x
def __rpow__(self, other, m=None) # y ** x
def __rmatmul__(self, other) # y @ x
def __rlshift__(self, other) # y << x
def __rrshift__(self, other) # y >> x
def __rand__(self, other) # y & x
def __ror__(self, other) # y | x
def __rxor__(self, other) # y ^ x
Then, there are the i-operators: +=
, -=
and the like. For immutable objects, don’t bother—it’s the same:
>>> n = 1>>> n += 1 # n.__iadd__(1)
>>> # Is the same as...
>>> n = n + 1 # n = n.__add__(1)
That’s the reason these operators weren’t mentioned when we were discussion scopes: they have nothing to do with assignment, binding names to values or resolving them; they’re just syntactic sugar for yet another kind of operation. Oh, and they don’t return anything: the i stands of “in-place”, and that’s how they should take effect—mutating their own instance.
class A:
def __iadd__(self, other) # x += y
def __isub__(self, other) # x -= y
def __imul__(self, other) # x *= y
def __itruediv__(self, other) # x /= y
def __ifloordiv__(self, other) # x //= y
def __imod__(self, other) # x %= y
def __ipow__(self, other) # x **= y
def __imatmul__(self, other) # x @= y
def __ilshift__(self, other) # x <<= y
def __irshift__(self, other) # x >>= y
def __iand__(self, other) # x &= y
def __ior__(self, other) # x |= y
def __ixor__(self, other) # x ^= y
There are a few more arithmetic operators I’ll cover for completeness sake:
- To determine how your object behaves when it is cast to integer, float or complex, implement
__int__
,__float__
and__complex__
. - To determine how your object behaves when it is rounded, implement
__round__
(forround(x)
),__floor__
(formath.floor(x)
),__ceil__
(formath.ceil(x)
) and__trunc__
(formath.trunc(x)
).
If you were wondering, like me, what’s the difference betweenfloor
andtrunc
—they both round down, butfloor
returns a float andtrunc
returns an integer. Hooray for time well spent! - Last and definitely least,
__index__
determines how your object behaves if it’s being used to index a list, like so:items[x]
. This is a crazy level of detail—and that’s how much Python enables you and empowers you to build incredible things on top of it.
Containing Oneself
This wasn’t easy—but it’s all part of Python’s data model. On to more exciting things: let’s talk about containers. You can do anything a list or a dictionary can:
>>> class A:
... def __getitem__(self, key):
... print(f'getting {key}')
... return 42
... def __setitem__(self, key, value):
... print(f'setting {key} to {value!r} (not really)')
... def __delitem__(self, key):
... print(f'deleting {key} (not really)')>>> a = A()
>>> a['x']
getting x
42
>>> a['x'] = 1
setting x to 1 (not really)
>>> del a['x']
deleting x (not really)
This is especially interesting if you work with slices: you know, that funny notation of start:stop
, or even start:stop:step
. Let’s play with it:
>>> class A:
... def __getitem__(self, key):
... print(key)>>> a = A()
>>> a[1]
1
>>> a[1:2]
slice(1, 2, None)
>>> a[1:2:3]
slice(1, 2, 3)
These slice
s are built-in objects, which have a start
, stop
and step
attributes—it’s up to you to decide what it means in your context. Moreover, Python indexing also supports tuples:
>>> a[1, 2]
(1, 2)
And tuples of slices:
>>> a[1:2, 3:4]
(slice(1, 2, None), slice(3, 4))
And the weird ellipses object, which is actually valid Python syntax:
>>> ...
Ellipsis
>>> a[1:2, ..., 3:4]
(slice(1, 2, None), Ellipsis, slice(3, 4, None))
This is pretty extreme—but allows for all sorts of smart n-dimensional indexing when dealing with data science, machine learning and the like. In fact, if you’ve ever worked with numpy
or something similar, you’re probably painfully familiar with this notation.
No container is complete without you being able to query its length, and whether it contains some item. Luckily, this is pretty easy:
class A:
... # Same as before
def __contains__(self, key):
... # Return whether key is part of a or not
def __len__(self):
... # Return a's length
And then there’s iteration—but before we get there, let’s pause for a moment to see how our newfound powers can be used for anything but a pedagogical anecdote.
Making More Pandas
Pandas is an insanely popular package for data analysis, and it works primarily with Data Frames—very versatile objects that let you express complex filtering and batch operations easily. Here’s an example:
>>> import pandas as pd
>>> data = [{'x': i, 'y': i**2} for i in range(5)]
>>> df = pd.DataFrame(data)
df
now represents a table with two columns, x
and y
, and 5 rows, where the i
-th row x
is set to i
, and y
to i
squared. Now check that out:
>>> df[df['y'] > 5]
[{'x': 3, 'y': 9}, {'x': 4, 'y': 16}]
In one fell swoop, we filtered out all the rows whose y
column is greater than 5
, resulting in only the last two rows. Like Richard Feynman said, “what I can’t explain, I don’t understand.”—so let’s implement it ourselves. First, let’s have a data frame class:
class DataFrame:
def __init__(self, data):
self.data = data
Now, obviously, this object should support indexing—but for two different scenarios. In the first scenario, like in df['y']
, it needs to produce an object representing a filter, which can be narrowed down by e.g. filter > 5
; in the second, like in df[filter]
, it needs to apply it, and return only the rows that match. Let’s begin:
class DataFrame:
...
def __getitem__(self, key):
if isinstance(key, str):
return Filter(key, self.data)class Filter:
def __init__(self, key, data):
self.key = key
self.data = data
What this filter is going to do is, upon comparison, return a list of booleans, indicating for each row whether for that key, its value was indeed greater or not.
class Filter:
...
def __gt__(self, other):
return [row[key] > other for row in self.data]
So far, so good:
>>> df = DataFrame(data)
>>> df['y']
<Filter object at 0x...>
>>> df['y'] > 5
[False, Flase, Flase, True, True]
Now to the second scenario—when a data frame object gets a list of booleans, it should apply this filter by returning only those rows:
class DataFrame:
...
def __getitem__(self, key):
...
if isinstance(key, list):
return [row for row, include in zip(self.data, key)
if include]
(Or, you can use the standard itertools.compress(self.data, key)
. I don’t know why I know that. Anyway—)
>>> df[df['y'] > y]
[{'x': 3, 'y': 9}, {'x': 4, 'y': 16}]
We did it! Of course, this is not exactly how Pandas work, but you got the point: Python really does support, with all its heart, building whatever you want on top of it.
The Last Iteration
Just one more magic method for now: __iter__
. This is the method invoked by the built-in iter
function, which is invoked by the for
loop, and it should return an object that conforms to Iterator protocol.
This is a bit confusing, so follow closely—the object we’re iterating over is called iterable; and what its __iter__
should return is called an iterator, which has a __next__
method (ha! I tricked you), which is called repeatedly to produce results, until it raises a StopIteration
. Historically, that’s how iteration was done: two separate classes, with the iterator one nested inside more often than not:
>>> class A:
... def __init__(self, x):
... self.x = x
... def __iter__(self):
... return A.Iterator(self)... class Iterator:
... def __init__(self, a):
... self.a = a
... self.i = 0
... def __next__(self):
... self.i += 1
... if self.i > self.a:
... raise StopIteration()
... return self.i
... # It's good practice for iterators to ignore iter().
... def __iter__(self):
... return self>>> a = A(3)
>>> for i in a:
... print(i)
1
2
3
In this case, iterating over an instance of A
produces all the integers from 1 to that instance’s x
. It does so by returning an Iterator
whose i
is 0 at first; but every time its __next__
is called, that i
is incremented and returned, until it exceeds that x
and starts raising StopIteration
instead.
That’s pretty simple—and incredibly tedious. You know what else responds to next
and is impervious to iter
?
>>> class A:
... def __init__(self, x):
... self.x = x
... def __iter__(self):
... for i in range(1, self.x + 1):
... yield i
... # Or even just yield from range(1, self.x + 1)>>> a = A(3)
>>> for i in a:
... print(i)
1
2
3
That’s right—generators! If iter(a)
, i.e. a.__iter__()
, has a yield
statement, it automatically becomes a generator, resuming whenever next
is called on it, until it can go no further and gasps out a StopIteration
. Cleaner, no?
Conclusion
This was a long and arduous journey, but we’re now approaching the gates of Mordor: attribute access, method resolution, descriptors, properties and all that jazz. The next topic is one that, at least in my eyes, truly sets novice and advanced Python programmers apart—so buckle up!
The Advanced Python Programming series includes the following articles:
- A Value by Any Other Name
- To Be, or Not to Be
- Loopin’ Around
- Functions at Last
- To Functions, and Beyond!
- Function Internals 1
- Function Internals 2
- Next Generation
- Objects — Objects Everywhere
- Objects Incarnate
- Meddling with Primal Forces
- Descriptors Aplenty
- Death and Taxes
- Metaphysics
- The Ones that Got Away
- International Trade