Eventlet vs. Asyncio
Explicit and implicit concurrency in Python.
Excuse the click-bait-y title, please. I’ve been playing with eventlet (and by extension, gevent based concurrency frameworks) and asyncio recently, and wanted a place to get my thoughts down.
Eventlet
At onefinestay we use nameko to run our service based platform. Nameko uses eventlet for concurrency, and it broadly works like this.
- An RPC method call comes in to a service runner
- The method call is handled in a separate coroutine
- If the method call performs any blocking IO — hitting the database, for instance — the coroutine is switched out in favour of some other running coroutine until the IO is done.
This is very much simplified, but the main point here is that switching out the currently running coroutine is performed implicitly — when I perform that database query, I don’t have to ask or mention that I’m about to do some IO.
Implicit concurrency has an attractive simplicity to it — there’s no special syntax and from a callers perspective, code looks much as it would do in plain vanilla python: Example eventlet code looks just like regular python.
Asyncio
The asyncio library was recently added to Python 3 and is an attempt to provide a low-level, built-in base for all/most of the various python concurrency frameworks out there.
With asyncio, an event loop runs and is in charge of switching between various coroutines. The main difference between asyncio and eventlet is that switching is performed explicitly. I have to use yield from in my coroutine if I wish to indicate that I’m ready to be switched out.
As well as asyncio being part of the standard library (and working with Python 3!), it’s use of explicit yielding is designed to simplify working with concurrent code — there’s no guess-work; you always know when your code might switch to something else.
Explicit vs. Implicit concurrency
I think I’m now getting to what I consider to be the main difference between these two approaches — the choice between explicit and implicit concurrency.
If you wanted to move from an implicit to an explicit world, your first approach might (very basically) be to just convert anything that does IO into a coroutine and pepper your code base with a bunch of yield from statements.
For a lot of use-cases, this is going to be fine.
Where explicit concurrency falls down
The problem with explicit concurrency is that you have to be explicit about it. Obvious, I know, but this offers up some interesting drawbacks.
Take an ORM, like SQLAlchemy (if you hate ORMs, you can make up your own example). What would this library look like in an explicit world?
Let’s say we want to query for a particular model instance — and therefore hit the database — how could the library be adapted to support that?
Not too bad so far. Our problems start though when we consider more advanced features, such as laziness.
Taking our example above, we’re going to say that Book has a one-to-many relationship with Chapter — a book can have multiple chapters associated with it. This will be exposed through a chapters attribute on the book instance. One more thing — in our world it’s expensive to fetch the chapters for a book, and we don’t always need them, so we want this attribute to be filled lazily.
Take a look at this snippet:
What happens on the first line — Do we hit the database? Can we switch?
In an implicit world, the answer to both questions is yes. We access the lazy attribute chapters, our ORM knows that this hasn’t been ‘filled’ yet and that a database query is needed. The ORM puts together the query, executes it — we implicitly yield here — and once the result is back, fills up chapters accordingly and exposes the result.
In an explicit world, its not clear how we’d model something like this. The flexibility of something like an ORM means there are so many potential interactions that could end up triggering some IO.
Some might take this as an argument against ORMs — “if it’s so complicated to use, and the behaviour of seemingly simple calls can vary so wildly, don’t use one”. It’s a valid viewpoint, though not something everyone will agree with.
Alternatively, perhaps an API could be provided that makes the laziness a little less magical. Check out this (ugly) code.
Lastly, maybe the contract you come to with your ORM could be such that it promises to give you a good explicit API, but you lose niceties such as dynamic lazy attributes.
The answer is in the middle?
Asyncio is still relatively new, and there’s a whole world of issues, problems and space for growth ahead of it, but at some point, the issues touched on in this post will need to be addressed — even if the conclusion is that it’s not going to try solving these problems.
Personally I’d like to see a more hybrid world. I’ve never been particularly bothered with implicit yielding switching out my coroutine at ‘unexpected’ times, but I do see the benefits of making things a little more explicit — and of course having asyncio as a blessed part of the standard library makes it really attractive to use.
It’s a tough problem to solve and I look forward to seeing what happens next.