10 Python Frameworks for Parallel and Distributed Machine Learning Tasks

Python Libraries that Enable Capabilities to Distribute and Parallelize ML Tasks

Sivasai Yadav Mudugandla
Mar 1 · 5 min read
Image by THAM YUAN YUAN from Pixabay

Nowadays, Neural network models are very deep and complicated with so many weights to learn. Training such models is very challenging. Data scientists need to set up distributed training, checkpointing, etc. Even after that, data scientists may not achieve the desired performance and convergence rate. Training large models is even more challenging in that the model easily runs out of memory.

In this article, we will see a list of Python Frameworks that allow us to Distribute and Parallelize the Deep Learning models.

1. Elephas

Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark. Elephas intends to keep the simplicity and high usability of Keras, thereby allowing for fast prototyping of distributed models, which can be run on massive data sets.

Elephas currently supports a number of applications, including:

maxpumperla/elephas: Distributed Deep learning with Keras & Spark (github.com)
pip install elephas

2. FairScale

FairScale is a PyTorch extension library for high performance and large scale training on one or multiple machines/nodes. This library extends basic PyTorch capabilities while adding new experimental ones.

FairScale supports:

  • Parallelism
  • Sharded training
  • Optimization at scale
  • GPU memory optimization
  • GPU speed optimization
pip install fairscale

3. TensorFlowOnSpark

By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers.

It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.

TensorFlowOnSpark was developed by Yahoo for large-scale distributed deep learning on our Hadoop clusters in Yahoo’s private cloud.

pip install tensorflowonspark

4. DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed delivers extreme-scale model training for everyone, from data scientists training on massive supercomputers to those training on low-end clusters or even on a single GPU:

  • Extreme-scale: Using the current generation of GPU clusters with hundreds of devices, 3D parallelism of DeepSpeed can efficiently train deep learning models with trillions of parameters.
  • Extremely memory efficient: With just a single GPU, ZeRO-Offload of DeepSpeed can train models with over 10B parameters, 10x bigger than the state of arts, democratizing multi-billion-parameter model training such that many deep learning scientists can explore bigger and better models.
pip install deepspeed

5. Horovod

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. Once a training script has been written for scale with Horovod, it can run on a single-GPU, multiple-GPUs, or even multiple hosts without any further code changes.

In addition to being easy to use, Horovod is fast. Below is a chart representing the benchmark that was done on 128 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:

https://github.com/horovod/horovod
#To run on CPUs
pip install horovod
#To run on GPUs with NCCL
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod

6. Mesh TensorFlow

Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying a broad class of distributed tensor computations. The purpose of Mesh TensorFlow is to formalize and implement distribution strategies for your computation graph over your hardware/processors. For example: "Split the batch over rows of processors and split the units in the hidden layer across columns of processors." Mesh TensorFlow is implemented as a layer over TensorFlow.

pip install mesh-tensorflow

7. BigDL

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters. To make it easy to build Spark and BigDL applications, a high-level Analytics Zoo is provided for end-to-end analytics + AI pipelines.

  • Rich deep learning support. Modelled after Torch, BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor) and high-level neural networks; in addition, users can load pre-trained Caffe or Torch models into Spark programs using BigDL.
  • Extremely high performance. To achieve high performance, BigDL uses Intel MKL / Intel MKL-DNN and multi-threaded programming in each Spark task. Consequently, it is orders of magnitude faster than out-of-box open-source Caffe, Torch or TensorFlow on a single-node Xeon (i.e., comparable with mainstream GPU). With the adoption of Intel DL Boost, BigDL improves inference latency and throughput significantly.
  • Efficiently scale-out. BigDL can efficiently scale out to perform data analytics at “Big Data scale”, by leveraging Apache Spark (a lightning-fast distributed data processing framework), as well as efficient implementations of synchronous SGD and all-reduce communications on Spark.
pip install BigDL

8. Analytics Zoo

Analytics Zoo seamless scales TensorFlow, Keras and PyTorch to distributed big data (using Spark, Flink & Ray).

  • End-to-end pipeline for applying AI models (TensorFlow, PyTorch, OpenVINO, etc.) to distributed big data
  • Write TensorFlow or PyTorch inline with Spark code for distributed training and inference.
  • Native deep learning (TensorFlow/Keras/PyTorch/BigDL) support in Spark ML Pipelines.
  • Directly run Ray programs on big data cluster through RayOnSpark.
  • Plain Java/Python APIs for (TensorFlow/PyTorch/BigDL/OpenVINO) Model Inference.
  • High-level ML workflow for automating machine learning tasks
  • Cluster Serving for automatically distributed (TensorFlow/PyTorch/Caffe/OpenVINO) model inference.
  • Scalable AutoML for time series prediction.
  • Built-in models for Recommendation, Time Series, Computer Vision and NLP applications.
https://github.com/intel-analytics/analytics-zoo
pip install analytics-zoo

9. Petastorm

Petastorm is an open-source data access library developed at Uber ATG. This library enables a single machine or distributed training and evaluation of deep learning models directly from datasets in Apache Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, PyTorch, and PySpark. It can also be used from pure Python code.

pip install petastorm

10. Apache SINGA

Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models.

#CPU only
conda install -c nusdbsystem -c conda-forge singa-cpu=3.1.0
#GPU with CUDA and cuDNN (CUDA driver >=384.81 is required)
conda install -c nusdbsystem -c conda-forge singa-gpu=3.1.0

Thank you for reading!

Any feedback and comments are, greatly appreciated!

The Startup

Get smarter at building your thing. Join The Startup’s +787K followers.

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Sivasai Yadav Mudugandla

Written by

Data scientist at Tech Mahindra | Post Graduate in AI & ML | Pythonista | Swimmer | Dancer. https://www.linkedin.com/in/sivasai-mudugandla-89a156104/

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Sivasai Yadav Mudugandla

Written by

Data scientist at Tech Mahindra | Post Graduate in AI & ML | Pythonista | Swimmer | Dancer. https://www.linkedin.com/in/sivasai-mudugandla-89a156104/

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store