Member-only story
Parallel batch processing in Python
Process in batches using joblib and show progress with tqdm
Joblib is a great tool for parallelization but sometimes it is better to process the workload in batches and not in the default iterative way. In this article I’ll show:
- Standard way to parallelize using joblib and tqdm
- Why and when it does not work
- Parallelize using batches
- Make progress work again
All code is available on Github. Feel free to contact me if you have any questions.
Code is also available as a Pypi package: pip install tqdm_batch
1) Straight forward method to parallelize using joblib
In 2021 almost every CPU we buy has multiple cores. My current laptop (Dell XPS) has an Intel i7 with 6 cores and hyper threading, which makes a total of 12 cores at your disposal. Even mobile phones nowadays have multiple CPUs and come with tremendous compute power. The cores in these CPU architectures can be identical, i.e. each core has identical processing power, or have tailored…

