Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Press enter or click to view image in full size
My personal workspace (Shot myself with a Canon-Rp)

Parallel batch processing in Python

Process in batches using joblib and show progress with tqdm

8 min readDec 27, 2021

--

Joblib is a great tool for parallelization but sometimes it is better to process the workload in batches and not in the default iterative way. In this article I’ll show:

  1. Standard way to parallelize using joblib and tqdm
  2. Why and when it does not work
  3. Parallelize using batches
  4. Make progress work again

All code is available on Github. Feel free to contact me if you have any questions.

Code is also available as a Pypi package: pip install tqdm_batch

Press enter or click to view image in full size
Using Python, joblib, and tqdm to batch process workloads.

1) Straight forward method to parallelize using joblib

In 2021 almost every CPU we buy has multiple cores. My current laptop (Dell XPS) has an Intel i7 with 6 cores and hyper threading, which makes a total of 12 cores at your disposal. Even mobile phones nowadays have multiple CPUs and come with tremendous compute power. The cores in these CPU architectures can be identical, i.e. each core has identical processing power, or have tailored…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Dennis Bakhuis
Dennis Bakhuis

Written by Dennis Bakhuis

Data Scientist with a passion for natural language processing and deep learning. Python and open source enthusiast. Background in fluid dynamics.

Responses (2)