Is asyncio too hard to use? Try Trio!
I’m working in a project with asyncio
and aiohttp
and sometimes it is confusing and difficult to test. This year I went to EuroPython 2018 in Edinburgh and attended some talks about Trio. Afterwards, I also watched a talk on YouTube from the author of Trio, Nathaniel, at PyConUS 2018.
Introduction
Trio is a library that uses the new async/await
syntax from python 3.5 so it requires at least this version. In this article I’ll use Python 3.6. It’s 100% developed in Python and it’s an open source library, you can find the source here.
Trio is a competitor of asyncio and Twisted.
Trio, like asyncio, is using only one thread, it’s not multi-threaded or multi-process. The Global Interpreter Lock (GIL) is not a problem here, but it’s not a real parallelization. These libraries take advantage of the I/O time-outs to do a pseudo-parallelization called asynchronous.
For example, when we do an HTTP request, we make the request and then we have to wait until the source returns a response. These libraries give us the tools to use the time-outs to run other tasks.
The most typical example is the sleep
function. When we use sleep
, the execution is stopped until the sleep
is finished, instead of waiting we put the execution in another job and when it’s finished or used an await
, the execution returns to the first (or other jobs). This process is called scheduler.
Scheduler and checkpoints
As mentioned before, there is only one thread, so how is it possible that this thread can run multiple tasks?
Well, the first step is to tell the thread that we want to use Trio:
trio.run(async_function_name)
The run
function is always at the beginning. In it a function has to be specified, this function has to be asynchronous with async def function_name
, it will be the parent function or the main function where the child will be spawned.
Then multiple children are spawned, for example three (more about nurseries later on):
nursery.start_soon(child1)
nursery.start_soon(child2)
nursery.start_soon(child3)
To follow this example, it is easier to have a look at the following diagram:
With this code we “put” these three tasks in the scheduler
and start the next sequence:
scheduler
chooses one of them in this examplechild2;
child2
starts to run until theawait
;await
executes acheckpoint
and it executes thescheduler
;- The execution returns to the
nursery
(or event loop to simplify) and thescheduler
is executed; - In this case the
scheduler
chooseschild3
; child3
runs until it finishes, the execution returns to thenursery
;- Meanwhile
child2
finished the IO operation. scheduler
chooseschild1
;child1
starts until theawait
;- The execution returns to the
nursery
, thescheduler
gives the execution tochild2
; - Finally
child2
finishes; - Execution return to the
nursery
, at this point thenursery
has to wait until thechild1
is ready to continue its task; - Finally
child1
receives the execution and finishes it; - Execution returns to
nursery
and the program finishes completely;
In Trio, the scheduler
process is called in every checkpoint
, but we don’t have to take care of it. All the Trio functions have a checkpoint
. This means that every time a Trio function is called, the scheduler
process is executed which checks if there are tasks waiting to be executed. The function await
also has a checkpoint, of course.
Example
In this example we demonstrate that Trio only uses one thread and if you want to use it you have to do it in a correct scenario and in a correct way.
Trio example:
Result:
4999999950000000
Total time: 4.150018930435181
Synchronous example:
Result:
4999999950000000
Total time: 3.596266984939575
In this example we can see that the synchronous program is faster than the asynchronous program. The most important observation however, is that in the asynchronous program two children are spawned at the same time, but one is working first and when it finishes the second starts. Because in the child code there isn’t any IO instruction there isn’t any trio function or await
and of course any checkpoint
called and therefore the scheduler
to change the task is not called either.
Exceptions and cancel
One problem with asyncio
is when one of the tasks fails and returns an exception, what happen with the other tasks? They continue until they finish.
In comparison, Trio behaves differently, if one task fails and returns an Exception
, all the tasks that were running are cancelled and the exception is propagated to the parent. If there is more than one exception, a MultiError
exception will be raised with all the exceptions inside.
How Trio can have this control over all the tasks that are executed? With a nursery!
The nursery
is the only way to spawn a child (task)
if a child
raises an exception the nursery
will take care of it. Below you can see an example with HTTP requests.
Another interesting feature are the timeouts that you can use to indicate the timeout of all the nursery or part of it, if the timeout is exceeded all the children are cancelled and the execution continues after the timeout.
with trio.move_on_after(10) as cancel:
await trio.sleep(20)
In this piece of code there is one sleep
of 20 seconds, but it will never finish. It will be cancelled after 10 seconds, because it’s wrapped in a block with ten seconds of timeout. You can wrap a nursery with multiple children inside and all of them will be cancelled.
By the way, the name nursery
is currently actively discussed, you can follow the discussion on Github.
HTTP requests
Maybe you are wondering if you can run multiple HTTP requests with Trio asynchronously. Indeed, you can do this, but you need asks
. This library is designed to work with curio
and Trio. It’s similar to requests in asynchronous way but smaller.
For now it’s the best option, maybe in the future urllib3
will be prepared to use trio
.
Here is a comparison to fetch some URLs synchronously with requests
and asynchronously with trio
and asks
.
Synchronous example
In this example the requests library is used to fetch the list of urls and counts the time. It’s synchronous then the URLs are fetched sequentially when the first finishes, the second starts and so on.
$ python fetch_urls_sync.pyStart: http://www.facebook.com
Finished: http://www.facebook.com 490801
Start: http://www.twitter.com
Finished: http://www.twitter.com 152610
Start: http://www.instagram.com
Finished: http://www.instagram.com 24259
Start: http://www.google.com
Finished: http://www.google.com 11252
Start: http://www.youtube.com
Finished: http://www.youtube.com 398845
Start: http://www.medium.com
Finished: http://www.medium.com 145728
Start: https://git-scm.com
Finished: https://git-scm.com 7565
Start: http://www.github.com
Finished: http://www.github.com 61724
Start: http://www.gitlab.com
Finished: http://www.gitlab.com 59857
Start: http://www.python.org
Finished: http://www.python.org 48822
Start: http://python-requests.org
Finished: http://python-requests.org 33918
Start: http://trio.readthedocs.io
Finished: http://trio.readthedocs.io 22668
Total time: 6.532523155212402
Asynchronous example
The following example does the same but asynchronously. First it’s doing the request of the first URL and while it’s waiting for the response (await
) it starts the request to the second URL and so on.
$ python fetch_urls_async.pyStart: http://www.instagram.com
Start: http://www.google.com
Start: https://git-scm.com
Start: http://www.python.org
Start: http://www.facebook.com
Start: http://www.gitlab.com
Start: http://www.twitter.com
Start: http://www.medium.com
Start: http://trio.readthedocs.io
Start: http://www.youtube.com
Start: http://www.github.com
Start: http://python-requests.org/
Finished: http://www.google.com 11272
Finished: https://git-scm.com 7565
Finished: http://www.python.org 48822
Finished: http://trio.readthedocs.io 22668
trioFinished: http://python-requests.org/ 33918
Finished: http://www.instagram.com 24216
Finished: http://www.medium.com 145767
Finished: http://www.facebook.com 490161
Finished: http://www.twitter.com 152610
Finished: http://www.youtube.com 394447
Finished: http://www.gitlab.com 59857
Finished: http://www.github.com 61730
Total time: 1.5571067333221436
In this example it’s clear that it’s much faster to use asynchronous code. The synchronous example took 6.5 seconds and the asynchronous example took only 1.5 seconds, almost 5 seconds faster only fetching 12 URLs.
By the way: It’s not about trio
or asks
, you can do exactly the same with asyncio
, but with trio
it’s easier to do.
Testing
One of the most difficult things in asynchronous developemnt is how to test it. Trio gives us some tools to test properly, there is one module called trio.testing
and also a pytest-trio
project.
On this page you can find the documentation to test with trio.testing
or here to check the pytest-trio
. The scope of this blogpost, however, is just an introduction to Trio, so I don’t want to get into much of the details in this article.
Community
The Trio community is really active and the author as well. I had a problem with one of the examples below and I contacted him on Gitter chat and we were chatting until he found the error.
Every day new issues and pull requests are created in Github, so it’s a really active community.
You can find my question and my answer made for Nathaniel here.