MPI Tutorial for Machine Learning (Part 2/3)

3 min readJul 16, 2023

(Intermediate and Advanced concepts)

In this tutorial, we'll cover:

Custom Communication Patterns
Collective Communication Operations
MPI with TensorFlow

We will build upon the basic knowledge shared in the previous tutorial.

1. Custom Communication Patterns

One of the most common ways of communicating between processes in MPI is by sending and receiving messages. Here's a simple example of the MPI send and recv (receive) methods.

from mpi4py import MPI

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()

    if rank == 0:
        data = {'a': 1, 'b': 2, 'c': 3}
        comm.send(data, dest=1)
    elif rank == 1:
        data = comm.recv(source=0)
        print(f"Received data {data} from process 0")

if __name__ == "__main__":
    main()

In this example, process 0 sends a dictionary to process 1, which prints it. You can run this program with two processes:

mpiexec -n 2 python send_recv.py

2. Collective Communication Operations

Collective communication operations involve all processes in the MPI communicator. Here are a few examples:

bcast: Broadcast a message from the process with rank "root" to all other processes.
gather: Gather values from all processes and deliver them to the root process.
reduce: Apply a reduction operation (like sum, max, min, etc.) to all values and deliver it to the root process.

Here's an example where we broadcast a message from the root process to all other processes:

from mpi4py import MPI

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()

    if rank == 0:
        data = {'a': 1, 'b': 2, 'c': 3}
    else:
        data = None

    data = comm.bcast(data, root=0)
    print(f"Process {rank} received data {data}")

if __name__ == "__main__":
    main()

You can run this program with four processes:

mpiexec -n 4 python bcast.py

3. MPI with TensorFlow

Let’s look at how we can integrate MPI with TensorFlow to distribute the training of a neural network. We’ll use the tf.distribute.experimental.MultiWorkerMirroredStrategy strategy for multi-worker training.

Here’s a simple example where we train a model on the MNIST dataset:

import tensorflow as tf
import json
from mpi4py import MPI

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()

    # Define a simple sequential model
    def create_model():
        return tf.keras.models.Sequential([
            tf.keras.layers.Dense(16, activation='relu', input_shape=(784,)),
            tf.keras.layers.Dense(10)
        ])

    # Specify the 'TF_CONFIG' environment variable for each worker
    tf_config = {
        'cluster': {
            'worker': ["localhost:12345", "localhost:23456"]
        },
        'task': {'type': 'worker', 'index': rank}
    }
    os.environ['TF_CONFIG'] = json.dumps(tf_config)

    # Choose the right strategy for multi-worker training
    strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

    # Load the dataset
    (x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
    x_train = x_train / 255.0

    # Batch the dataset
    BUFFER_SIZE = len(x_train)
    BATCH_SIZE_PER_REPLICA = 64
    GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * size
    train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(BUFFER_SIZE).batch(GLOBAL_BATCH_SIZE)

    # Open a strategy scope and define and compile the model inside it
    with strategy.scope():
        model = create_model()
        model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                      optimizer=tf.keras.optimizers.Adam(),
                      metrics=['accuracy'])

    # Train the model
    model.fit(train_dataset, epochs=3)

if __name__ == "__main__":
    main()

This script trains a simple neural network on the MNIST dataset. It first defines a function to create the model and then specifies each worker's ‘TF_CONFIG’ environment variable. It loads the dataset, opens a strategy scope, defines and compiles the model inside it, and finally trains the model.

Remember to replace “localhost:12345” and “localhost:23456” with the actual addresses of your workers.

You can run this script with two processes, one for each worker:

mpiexec -n 2 python tensorflow_example.py

That wraps up our intermediate and advanced tutorial on MPI for Machine Learning. MPI can be a powerful tool for distributing Machine Learning tasks but has a steep learning curve. This tutorial helped you better understand how to use MPI with Python for Machine Learning.

Let’s extend this tutorial further to cover: Handling exceptions, Profiling and Optimizing MPI programs, and, last but not least, integrating with libraries like Tensorflow or Pytorch.

MPI Tutorial for Machine Learning (Part 2/3)

(Intermediate and Advanced concepts)

1. Custom Communication Patterns

2. Collective Communication Operations

3. MPI with TensorFlow

Written by Thiwanka Chameera Jayasiri