ADVANCED PYTHON PROGRAMMING
Death and Taxes
This time, we’ll see how objects can be used to manage contexts—and how exactly an object is born, lives, and dies.
Last time, we talked about the awesome power of descriptors; now, getting back to objects, we still have a few behaviors to cover: namely, context management and creation. After that, I’ll add another article for the ones that got away—but in the meantime:
Bossy, but Sensitive
Let’s start with a background story for motivation, again. You probably did this in the past:
>>> fp = open(path)
>>> data = fp.read()
>>> fp.close()
And that’s OK; the question is, whether you did it like so:
>>> fp = open(path)
>>> try:
... data = fp.read()
... finally:
... fp.close()
The reason being, fp
occupies a certain resources—namely, a file descriptor, which keeps an open channel to that file—and should the read
method fail and raise an exception, fp
might not be properly closed and reclaimed, resulting in a resource leak. It’s even worse with multi-threading:
>>> lock = threading.Lock()
>>> lock.acquire()
>>> # Critical section
>>> lock.release()
The critical section is that part in multi-threaded code that has to be mutually exclusive, so threads don’t work on the same resource all at once; and if it happens to raise an exception, we’ll leave the system locked, with all the other threads that want to enter the same critical section for a lock that we’ve acquired, but never released. It should’ve been:
>>> lock = threading.Lock()
>>> try:
... lock.acquire()
... # Critical section
... finally:
... lock.release()
So that no matter what happens, we’d have at least released the lock. This try...finally
shtick works, but it’s pretty verbose—and it’d be nicer if this setup and cleanup logic could be encapsulated in the object itself.
Enter context managers. Python has quite a few by default—activating them is only a question of syntax. In case of reading a file, you could do:
>>> with open(path) as fp:
... data = fp.read()
Which would open a file, executed the indented block, sometimes called the managed context—and close the file at the end, whether we left this context naturally or because an exception was raised. Similarly:
>>> lock = threading.Lock()
>>> with lock:
... # Critical section
This would acquire the lock, then enter the critical section, and finally release the lock, no matter how the critical section ended.
Do It Yourselves
It should come as no surprise that Python lets you define your own context managers, using the __enter__
and __exit__
methods; as you’d expect, the first happens when you enter the context, and the second when you exit it. Let’s take them for a spin:
class ContextManager: def __enter__(self):
print('before') def __exit__(self, *args):
print('after')
Now we can do this:
>>> cm = ContextManager()
>>> with cm:
... print('inside')
before
inside
after
Or even…
>>> cm = ContextManager()
>>> with cm:
... raise ValueError()
before
after
Traceback (most recent call last):
...
ValueError
Note the ValueError
in fact originated between the before
and after
—it was just propagated, leaving the context, and getting caught and printed only by the Python interpreter, after the context’s __exit__
.
Now, there are a few things you need to know about context managers: first, whatever __enter__
returns is assigned to the as <name>
part of the with
statement, if it’s present:
>>> class A:
... def __enter__(self):
... return 1
... def __exit__(self, *args):
... pass
>>> a = A()
>>> with a as x:
... print(x)
1
It’s customary to return self
, so that your context manager can be initialized in the with
statement. I mean, if you do this:
>>> with A() as a:
... print(a)
1
You’re still going to get 1, not an A
object—because the A
object is created, its __enter__
is invoked, its return value is bound to a
, and the actual object doesn’t have any name you could use. If, on the other hand, __enter__
would return self
, that’d be OK:
>>> with A() as a:
... print(a)
<A object at 0x...>
Let’s talk about __exit__
: so far I’ve just written it with the *args
signature, but in fact it always receives exactly three arguments: the exception class, the exception object, and the traceback. If the context was left naturally, all three are None
—but if an exception has occurred, this triplet describes it (albeit, a bit redundantly; the exception and traceback objects would’ve been enough). In some cases, you don’t care about these arguments—a file should be closed and a lock released whether an exception happened or not. But sometimes, like when you’re encapsulating a database transaction, you might want to commit it only if it’s been carried out fully—and roll back otherwise. Here’s some code, assuming a db
object with a standard begin/commit/rollback functionality:
class Transaction: def __init__(self, db):
self.db = db def __enter__(self):
self.db.begin()
return self def __exit__(self, exception, error, traceback):
if not exception:
self.db.commit()
else:
self.db.rollback()
Another fun fact about __exit__
: if its return value evaluates to True
, any exception raised within the context is suppressed. Most __exit__
s don’t return anything, which means they implicitly return None
, or False
, so exceptions are propagated; but watch this:
>>> class A:
... def __enter__(self):
... return self
... def __exit__(self, exception, error, traceback):
... return True>>> with A():
... raise ValueError()
# Nothing!
This comes in handy sometimes—for example, you know how you sometimes have an infinite loop wrapped up in a try...except
to exit when the user pressed control+C? It’d look like this:
try:
# Do stuff
except KeyboardInterrupt:
pass
Which is clunky. How about:
with Suppress(KeyboardInterrupt):
# Do stuff
The Suppress
class is pretty straightforward; we can even support multiple exception classes with some star notation:
class Suppress: def __init__(self, *exceptions):
self.exceptions = exceptions def __enter__(self):
return self def __exit__(self, exception, error, traceback):
return isinstance(error, self.exceptions)
If the raised error is an instance of any of those exception classes, it’s suppressed—otherwise, it’s propagated:
>>> with suppress(NameError, TypeError):
... raise NameError()
>>> with suppress(NameError, TypeError):
... raise TypeError()
>>> with suppress(NameError, TypeError):
... raise ValueError()
Traceback (most recent call last):
...
ValueError
Oh, and by the way—Python provides exactly that as contextlib.suppress
.
Generator Magic
Let’s write a context manager of our own! How about something for benchmarking, to measure how long a block of code takes?
import timeclass Timer: def __enter__(self):
self.started = time.time()
return self def __exit__(self, exception, error, traceback):
elapsed = time.time() - self.started
print('elapsed: {elapsed:0.2f} s')
It’s actually pretty handy:
>>> with Timer():
... time.sleep(1)
elapsed: 1.01 s
But it’s so cumbersome! Wish that we could just write a function:
def timer():
started = time.time()
# Execute the managed context
elapsed = time.time() - started
print('elapsed: {elapsed:0.2f} s')
Except, how do we split a function in the middle, so that it runs a while, yields control, and then resumes? Generators, of course—
def timer():
started = time.time()
yield
elapsed = time.time() - started
print('elapsed: {elapsed:0.2f} s')
We’d then wrap it up in a class, which would execute its first part on __enter__
and its second part on __exit__
:
class ContextManager: def __init__(self, g):
self.g = g def __enter__(self):
self.flow = self.g() # Create an execution
next(self.flow) # and start it!
return self def __exit__(self, exception, error, traceback):
# Continue the execution, and suppress StopIteration
try:
next(self.flow)
except StopIteration:
pass
Let’s see that it works:
>>> with ContextManager(timer):
... time.sleep(1)
elapsed: 1.01 s
However, let’s make it even better—first, let’s apply it as a decorator to our generators, promoting them to context managers instantly; let’s make it so whatever’s yielded from the generator is bound to the as
name; and let’s make it so if an exception was raised, the generator is resumed by rethrowing it inside it! This way, we can truly and wholly encapsulate the context manager experience in a simple function (well, a decorated generator):
class ContextManager: def __init__(self, g):
self.g = g def __enter__(self):
self.flow = self.g()
return next(self.flow) def __exit__(self, exception, error, traceback):
try:
if not exception:
next(self.flow)
else:
self.flow.throw(exception, error, traceback)
except StopIteration:
return True
Now we can do this:
>>> @ContextManager
... def context_manager():
... print('before')
... try:
... yield 'context'
... print('after')
... except Exception:
... print('error')
... raise # Re-raise exception>>> with context_manager() as cm:
... print(f'inside {cm}')
before
inside context
after>>> with context_manager() as cm:
... raise ValueError()
before
error
Traceback (most recent call last):
...
ValueError
And as it happens, Python provides this, too, by default—just import contextlib
and use its contextmanager
decorator on your generators.
The Birds and the Bees
One last thing we have to discuss about object is how they come into this world. Some people immediately think of __init__
, and I’ve occasionally called it “the constructor” myself; but in actuality, as its name indicates, it’s an initializer—by the time it’s invoked, the object already exists, seeing as it’s passed in as self
. The real constructor is a far less famous function: __new__
.
The reason you might never hear about it or use it—is that allocation doesn’t mean that much in Python, which manages memory on its own. So if you do override new, it’d be just like your __init__
—except you’ll have to call into Python to actually create the object, and then return that object at the end:
>>> class Point:
... def __new__(cls, x, y):
... instance = super().__new__(cls)
... instance.x = x
... instance.y = y
... return instance>>> p = Point(1, 2)
>>> p.x
1
>>> p.y
2
It works, but it’s pretty weird: we have to call super
, which delegates the actual allocation to object
—which kinda is the only one that can actually do it; then we have to do __init__
’s usual assignment; and then we have to return it. Add to this the fact that __new__
is an implicit class method (so you don’t have to decorate it, but do have to pass cls
to super
’s __new__
), and there’s really no reason to do this instead of just:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
Unless, of course, you deliberately want to interfere with the object allocation. I mean, you could do this:
>>> class Point:
... def __new__(cls, x, y):
... return 42>>> p = Point(1, 2)
>>> p
42
That would be great for trolling; but seriously now, sometimes it is useful. Let’s say we want to minimize our memory footprint, by reusing similar objects—especially if they’re immutable. Take a look:
>>> class A:
... def __init__(self, x):
... self.x = x
... def __eq__(self, other):
... return isinstance(other, A) and self.x == other.x>>> a1 = A(1)
>>> a2 = A(1)
>>> a1 == a2
True
>>> a1 is a2
False
This is the simple A
class, from when we just started talking about objects. Its whole existence, its entire state, are defined by a single x
; and even though we’ve implemented the equality operator, and our two instances are practically identical—they’re still two separate objects, taking twice as much memory as necessary. What if we cache these objects by their x
, and if such an object is already allocated—reuse it instead?
>>> class A:
... _cache = {}
... def __new__(cls, x):
... if x not in cls._cache:
... instance = super().__new__(cls)
... instance.x = x
... cls._cache[x] = instance
... return cls._cache[x]>>> a1 = A(1)
>>> a2 = A(2)
>>> a1 is a2
True
Pretty cool, no? Incidentally, we can drop the __eq__
, since every object is equal to itself. This practice is sometimes called an object pool, because we effectively create a pool of objects that we keep reusing again and again. It does come with a few gotchas, so let’s have another Q&A session.
1
Q: wouldn’t _cache
grow indefinitely, preventing objects from being garbage collected and causing a memory leak?
A: Why yes, yes it would. The right way to do it would be to use weakref
’s WeakValueDictionary
, which is like a dictionary, only its value references don’t increase the reference count used by the garbage collector; it’s what called a weak reference, and when these values do get garbage collected, they simply disappear from this dictionary:
import weakrefclass A:
_cache = weakref.WeakValueDictionary()
... # Same as before
2
Q: wouldn’t it be better to use __new__
for pooling, but separate the initialization logic into the __init__
?
A: In theory, that would be nice. However, the way Python works is that when you invoke a class—that is, upon instantiation—it first calls __new__
, and then—no matter what it returns, as long as it’s an instance of that class, it’s passed to __init__
. So if we pool objects, but still do something in __init__
, this logic will be re-executed on the object every time it’s fetched from that pool (while the __new__
logic that put him in that pool will only happen once). See for yourself—imagine we’d try implementing a singleton, which is a class that only has one instance:
class Singleton: _instance = None def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance def __init__(self):
self.x = 1
As far as its singularity goes, this singleton works:
>>> s1 = Singleton()
>>> s2 = Singleton()
>>> s1 is s2
True
However, __init__
gets rerun on this poor object time and again, potentially resetting its state:
>>> s1.x = 2
>>> s1.x
2
>>> s3 = Singleton()
>>> s1.x
1
So my advice: stick to either __init__
(in most cases), or __new__
(when you really need it); but don’t mix both.
Death
Every object has its day; although in Python, destructors are not nearly as popular as in other languages. I mean, you have the __del__
method:
>>> class A:
... def __del__(self):
... print('😵')
>>> a = A()
>>> del a
😵
But it’s not very reliable—primarily because it’s Python’s job to manage memory, and it makes no guarantees as to when exactly this object will be destroyed, and its destructor called. The fact that in our previous demo it works mean very little; in other cases, it might’ve happened a minute or an hour later, which is not ideal, especially if you’re one of those people who like their code to be deterministic. Instead, I suggest using context managers to deal with setup and teardown logics; and if you really, really have to, do this:
>>> import weakref
>>> class A:
... def __init__(self):
... weakref.finalizer(self, print, '😵')
>>> a = A()
>>> del a
😵
What this does, is keeps track of when exactly the object is garbage collected, and then invokes the print
function on the argument ‘😵’
. More generally, its signature is (obj, func, *args, **kwargs)
, and it’s pretty similar to the __del__
method; except it’s more explicit about this event happening asynchronously, some time in the future. Passing a callback feels less of a commitment than defining some actual behavior; but in any case, use context managers instead.
Conclusion
Objects are infinitely versatile—we can even have them managing contexts of nested code, and customize the way they’re created. The still have a few behaviors we haven’t covered, either because they’re pretty minor, or because I couldn’t figure out how to fit them organically in our narrative; but in any case, we’ll have a chapter for miscellanea later—next time, I want to keep this positive energy going and escalate to classes and metaclasses.
The Advanced Python Programming series includes the following articles:
- A Value by Any Other Name
- To Be, or Not to Be
- Loopin’ Around
- Functions at Last
- To Functions, and Beyond!
- Function Internals 1
- Function Internals 2
- Next Generation
- Objects — Objects Everywhere
- Objects Incarnate
- Meddling with Primal Forces
- Descriptors Aplenty
- Death and Taxes
- Metaphysics
- The Ones that Got Away
- International Trade