Is asyncio too hard to use? Try Trio!

I’m working in a project with asyncio and aiohttp and sometimes it is confusing and difficult to test. This year I went to EuroPython 2018 in Edinburgh and attended some talks about Trio. Afterwards, I also watched a talk on YouTube from the author of Trio, Nathaniel, at PyConUS 2018.

Train tracks (Photo by Michael Gaida)

Introduction

Trio is a library which wraps asyncio and makes our lives easier. It’s 100% developed in Python and it’s an open source library, you can find the source here.

Trio uses the new syntax of asyncio internally so it requires at least Python version 3.5. In this article I’ll use Python 3.6.

Trio, like asyncio, is using only one thread, it’s not multi-threaded or multi-process. The Global Interpreter Lock (GIL) is not a problem here, but it’s not a real parallelization. These libraries take advantage of the I/O time-outs to do a pseudo-parallelization called asynchronous.

For example, when we do an HTTP request, we make the request and then we have to wait until the source returns a response. These libraries give us the tools to use the time-outs to run other tasks.

The most typical example is the sleep function. When we use sleep, the execution is stopped until the sleep is finished, instead of waiting we put the execution in another job and when it’s finished or used an await, the execution returns to the first (or other jobs). This process is called scheduler.

Scheduler and checkpoints

As mentioned before, there is only one thread, so how is it possible that this thread can run multiple tasks?

Well, the first step is to tell the thread that we want to use Trio:

trio.run(async_function_name)

The run function is always at the beginning. In it a function has to be specified, this function has to be asynchronous with async def function_name , it will be the parent function or the main function where the child will be spawned.

Then multiple children are spawned, for example three (more about nurseries later on):

nursery.start_soon(child1)
nursery.start_soon(child2)
nursery.start_soon(child3)

To follow this example, it is easier to have a look at the following diagram:

With this code we “put” these three tasks in the scheduler and start the next sequence:

  1. scheduler chooses one of them in this example child2;
  2. child2 starts to run until the await;
  3. await executes a checkpoint and it executes the scheduler;
  4. The execution returns to the nursery (or event loop to simplify) and the scheduler is executed;
  5. In this case the scheduler chooses child3;
  6. child3 runs until it finishes, the execution returns to the nursery;
  7. Meanwhile child2 finished the IO operation.
  8. scheduler chooses child1;
  9. child1 starts until the await;
  10. The execution returns to the nursery, the scheduler gives the execution to child2;
  11. Finally child2 finishes;
  12. Execution return to the nursery, at this point the nursery has to wait until the child1 is ready to continue its task;
  13. Finally child1 receives the execution and finishes it;
  14. Execution returns to nursery and the program finishes completely;

In Trio, the scheduler process is called in every checkpoint, but we don’t have to take care of it. All the Trio functions have a checkpoint. This means that every time a Trio function is called, the scheduler process is executed which checks if there are tasks waiting to be executed. The function await also has a checkpoint, of course.

Example

In this example we demonstrate that Trio only uses one thread and if you want to use it you have to do it in a correct scenario and in a correct way.

Trio example:

Result:

4999999950000000
Total time: 4.150018930435181

Synchronous example:

Result:

4999999950000000
Total time: 3.596266984939575

In this example we can see that the synchronous program is faster than the asynchronous program. The most important observation however, is that in the asynchronous program two children are spawned at the same time, but one is working first and when it finishes the second starts. Because in the child code there isn’t any IO instruction there isn’t any trio function or await and of course any checkpoint called and therefore the scheduler to change the task is not called either.

Exceptions and cancel

One problem with asyncio is when one of the tasks fails and returns an exception, what happen with the other tasks? They continue until they finish.

In comparison, Trio behaves differently, if one task fails and returns an Exception, all the tasks that were running are cancelled and the exception is propagated to the parent. If there is more than one exception, a MultiError exception will be raised with all the exceptions inside.

How Trio can have this control over all the tasks that are executed? With a nursery!

Photo by li tzuni on Unsplash

The nursery is the only way to spawn a child (task) if a child raises an exception the nursery will take care of it. Below you can see an example with HTTP requests.

Another interesting feature are the timeouts that you can use to indicate the timeout of all the nursery or part of it, if the timeout is exceeded all the children are cancelled and the execution continues after the timeout.

with trio.move_on_after(10) as cancel:
await trio.sleep(20)

In this piece of code there is one sleep of 20 seconds, but it will never finish. It will be cancelled after 10 seconds, because it’s wrapped in a block with ten seconds of timeout. You can wrap a nursery with multiple children inside and all of them will be cancelled.

By the way, the name nursery is currently actively discussed, you can follow the discussion on Github.

HTTP requests

Maybe you are wondering if you can run multiple HTTP requests with Trio asynchronously. Indeed, you can do this, but you need asks. This library is designed to work with curio and Trio. It’s similar to requests in asynchronous way but smaller.

For now it’s the best option, maybe in the future urllib3 will be prepared to use trio.

Here is a comparison to fetch some URLs synchronously with requests and asynchronously with trio and asks.

Synchronous example

In this example the requests library is used to fetch the list of urls and counts the time. It’s synchronous then the URLs are fetched sequentially when the first finishes, the second starts and so on.

$ python fetch_urls_sync.py
Start:  http://www.facebook.com
Finished: http://www.facebook.com 490801
Start: http://www.twitter.com
Finished: http://www.twitter.com 152610
Start: http://www.instagram.com
Finished: http://www.instagram.com 24259
Start: http://www.google.com
Finished: http://www.google.com 11252
Start: http://www.youtube.com
Finished: http://www.youtube.com 398845
Start: http://www.medium.com
Finished: http://www.medium.com 145728
Start: https://git-scm.com
Finished: https://git-scm.com 7565
Start: http://www.github.com
Finished: http://www.github.com 61724
Start: http://www.gitlab.com
Finished: http://www.gitlab.com 59857
Start: http://www.python.org
Finished: http://www.python.org 48822
Start: http://python-requests.org
Finished: http://python-requests.org 33918
Start: http://trio.readthedocs.io
Finished: http://trio.readthedocs.io 22668
Total time: 6.532523155212402

Asynchronous example

The following example does the same but asynchronously. First it’s doing the request of the first URL and while it’s waiting for the response (await) it starts the request to the second URL and so on.

$ python fetch_urls_async.py
Start:  http://www.instagram.com
Start: http://www.google.com
Start: https://git-scm.com
Start: http://www.python.org
Start: http://www.facebook.com
Start: http://www.gitlab.com
Start: http://www.twitter.com
Start: http://www.medium.com
Start: http://trio.readthedocs.io
Start: http://www.youtube.com
Start: http://www.github.com
Start: http://python-requests.org/
Finished: http://www.google.com 11272
Finished: https://git-scm.com 7565
Finished: http://www.python.org 48822
Finished: http://trio.readthedocs.io 22668
trioFinished: http://python-requests.org/ 33918
Finished: http://www.instagram.com 24216
Finished: http://www.medium.com 145767
Finished: http://www.facebook.com 490161
Finished: http://www.twitter.com 152610
Finished: http://www.youtube.com 394447
Finished: http://www.gitlab.com 59857
Finished: http://www.github.com 61730
Total time: 1.5571067333221436

In this example it’s clear that it’s much faster to use asynchronous code. The synchronous example took 6.5 seconds and the asynchronous example took only 1.5 seconds, almost 5 seconds faster only fetching 12 URLs.

By the way: It’s not about trio or asks, you can do exactly the same with asyncio, but with trio it’s easier to do.

Testing

One of the most difficult things in asynchronous developemnt is how to test it. Trio gives us some tools to test properly, there is one module called trio.testing and also a pytest-trio project.

On this page you can find the documentation to test with trio.testing or here to check the pytest-trio. The scope of this blogpost, however, is just an introduction to Trio, so I don’t want to get into much of the details in this article.

Community

The Trio community is really active and the author as well. I had a problem with one of the examples below and I contacted him on Gitter chat and we were chatting until he found the error.

Every day new issues and pull requests are created in Github, so it’s a really active community.

You can find my question and my answer made for Nathaniel here.

Sources