Python Tutorial — Speed up your IO operations with Futures in Python
Native futures were introduced in Python 3. Like most python programmers who have never done any sort of asynchronous programming will be unfamiliar with futures programming.
What exactly is a Python Future?
A future is a computational construct introduced in python 3. A future provides an interface to represent operation which when is being created might no hold any particular value . However it is expected to do so in the future.
A working example
Consider you are given a list of urls which you need to make a get request against, something like this:
Now , if you do not know anything about futures you have three options as to how you can go about doing this:
- Write a simple for loop which makes a request to all these URLs sequentially. This though being the simplest case will ensure that the loop will block on every HTTP call.
- Custom writing your own threading module code, and then invoking it. While you might be able to achieve concurrency through this method, you will still have to write a lot of code to do this.
- Using multiprocessing to do this. This option is again plagued by the problem that the second module has. You will have to write a lot of code for this, and additionally there is a strict limitation on what you can and cannot pass between different processes in multiprocessing. Hence it might work for the above mentioned case, you are bound to run into problems later when you are dealing with complex.
Sample Code(without futures)
You could end up writing something like this:
Every subsequent call after the first call will wait till the one before it finishes. This is wastage of resources as well as processing power, and not to mention in case if one URL fetch fails everything after that will fail also(you could ideally handle this with relevant try-except block, but ideally I’d want to write something that doesn’t involve that as well).
Sample Code(With Futures)
We can do the same thing with futures as follows:
The following line
makes sure that the ThreadPoolExecutor shutdown till all the threads/futures are evaluated.
Dissecting the code(only futures) line by line
This initializes a pool of threads which can at any point contain a maximum of 5 threads. Whenever a tasks is submitted to a thread pool executor it spins up a new thread if no other thread is idle and the number of busy threads is less than the max_workers flag defined.
Here we iterate over the list of all the urls and submit each url to be processed by a worker function that we have already written. Notice how we pass the function reference and the params separately.
This returns a future object. This can be checked for results.
This line tells the thread pool to shutdown. You will not be able to submit anymore tasks to this thread pool.
While the above code gives a rough idea about how to use futures effectively. However I still don’t know what’s the performance boost using futures offers when I am doing a lot of IO.
Hence I wrote a small testing the performance of both the implementations:
On running the above script we get a performance difference of about 6x.
Time taken by normal implementation 6 seconds
Time taken by futures implementation 1 seconds
However over a course of 1000 iterations it goes down to about 3X(which is still a great performance boost).
When not to use futures(or threading in general wrt to Python)
The most important caveat while using futures(in python) is to understand that futures(and threading) can only give you a performance boost whenever there is some IO that eats up resources while the program is blocked on it.
Using futures or threading for computation or CPU heavy tasks is not recommended. It will not result in any noticeable performance gain because of the infamous GIL in python. Read more about the GIL here.
Conclusions and Further
I will be doing an in depth review of the futures implementation in python3.
This will allow us to use futures effectively then what we do now. Additionally having in depth knowledge about an implementation helps avoid any hidden performance bottle necks.
In the meantime tell me if you use python futures in production and your experience with them!!!
Originally published at https://technokeeda.com on February 25, 2019.