Tensorflow multi-gpu for inferencing (@ test time)

2 min readMay 21, 2018

Tensorflow is one of the popular deep learning frameworks for creating neural networks. Most of the times, once the network is trained, inferencing is performed (i.e test time). Depending on the number of users your application serves, it might be necessary to scale your system and perform inferencing on multiple gpus. I would like to demonstrate, how to do multi gpu inferencing in this article.

Following are the steps

Create your tensorflow network/graph. Here, I am creating a small network with 2 convolutional, 2 pooling and 2 fully connected layers.

2. Using tf.device you can assign on which device the graph/network should perform computation. In multi_gpu function, we assign each graph or network to the respective gpus.

3. To make best use of all the gpus, we create batches, such that each batch is a tuple of inputs to all the gpus. i.e if we have 100 batches of N * W * H * C and are using 2 gpus, we will have a list of tuples [(batch1, batch2), (batch3, batch4), (batch5, batch6)….(batch99, batch100) ]

4. Now, we use the batches created to feed to the network.

Let us now see the gpu utilization, num_gpus = 1

Total time taken (gpus=1) = 39.968940 secs

Let us now see the gpu utilization, num_gpus = 2

Total time taken (gpus=2): 24.002538 secs

Thus, we can see the time is reduced nearly by half. Reduction in time may not always be linear and it depends on how deep is your network. Deeper the network and you can make the most out of multi gpus.

Thanks for reading.

Tensorflow multi-gpu for inferencing (@ test time)

Written by Suhas