Tensorflow is one of the popular deep learning frameworks for creating neural networks. Most of the times, once the network is trained, inferencing is performed (i.e test time). Depending on the number of users your application serves, it might be necessary to scale your system and perform inferencing on multiple gpus. I would like to demonstrate, how to do multi gpu inferencing in this article.
Following are the steps
- Create your tensorflow network/graph. Here, I am creating a small network with 2 convolutional, 2 pooling and 2 fully connected layers.
2. Using tf.device you can assign on which device the graph/network should perform computation. In multi_gpu function, we assign each graph or network to the respective gpus.
3. To make best use of all the gpus, we create batches, such that each batch is a tuple of inputs to all the gpus. i.e if we have 100 batches of N * W * H * C and are using 2 gpus, we will have a list of tuples [(batch1, batch2), (batch3, batch4), (batch5, batch6)….(batch99, batch100) ]
4. Now, we use the batches created to feed to the network.
Let us now see the gpu utilization, num_gpus = 1
Total time taken (gpus=1) = 39.968940 secs
Let us now see the gpu utilization, num_gpus = 2
Total time taken (gpus=2): 24.002538 secs
Thus, we can see the time is reduced nearly by half. Reduction in time may not always be linear and it depends on how deep is your network. Deeper the network and you can make the most out of multi gpus.
Thanks for reading.