Benchmarking GPU and TPU performance for deep learning

Published in

CSS Knust

2 min readAug 12, 2020

Aside from the lack of training data, computing power was one of the difficulties which hindered the progress of artificial neural networks(ANN) and deep learning research. But today, the computing power needed to train AI is now rising seven times faster than ever before.

We will be comparing TPU and GPU as used on Colab. TPUs were only available on Google cloud but now they are available for free in Colab.

This tutorial assumes that you understand basic Python and TensorFlow.

Enabling and testing the GPU

First, you’ll need to enable GPU for the notebook:

Navigate to Edit→Notebook Settings
Select GPU from the Hardware Accelerator drop-down

Next, we’ll check that we can connect to the GPU:

%tensorflow_version 2.ximport tensorflow as tfdevice_name = tf.test.gpu_device_name()if device_name != '/device:GPU:0':raise SystemError('GPU device not found')print('Found GPU at: {}'.format(device_name))

Expected output:

[] TensorFlow 2.x selected. 
[] Found GPU at: /device:GPU:0

Enabling and testing the TPU

First, you’ll need to enable TPUs for the notebook:

Navigate to Edit→Notebook Settings
Select TPU from the Hardware Accelerator drop-down

Next, we’ll check that we can connect to the TPU:

try:tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  
# TPU detectionprint('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])except ValueError:raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')tf.config.experimental_connect_to_cluster(tpu)tf.tpu.experimental.initialize_tpu_system(tpu)tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

Expected output:

[]INFO:tensorflow:Initializing the TPU system: grpc://10.38.216.122:8470[]INFO:tensorflow:Initializing the TPU system: grpc://10.38.216.122:8470[]INFO:tensorflow:Clearing out eager caches[]INFO:tensorflow:Clearing out eager caches[]INFO:tensorflow:Finished initializing TPU system.[]INFO:tensorflow:Finished initializing TPU system.
WARNING:absl:`tf.distribute.experimental.TPUStrategy` is deprecated, please use  the non experimental symbol `tf.distribute.TPUStrategy` instead.[]INFO:tensorflow:Found TPU system:[]INFO:tensorflow:Found TPU system:[]INFO:tensorflow:*** Num TPU Cores: 8[]INFO:tensorflow:*** Num TPU Cores: 8[]INFO:tensorflow:*** Num TPU Workers: 1[]INFO:tensorflow:*** Num TPU Workers: 1[]INFO:tensorflow:*** Num TPU Cores Per Worker: 8[]INFO:tensorflow:*** Num TPU Cores Per Worker: 8

Now, we’ll make heavy use of tf.data.experimental.AUTOTUNE to optimize different parts of input loading. We instantiate constant variables to specify the path to our input data and for the autotune.

AUTO = tf.data.experimental.AUTOTUNE#Assuming our input data is stored on Google Cloud Storagegcs_pattern = 'gs://...'

Benchmarking GPU and TPU performance for deep learning

Enabling and testing the GPU

Enabling and testing the TPU

Further Reading

Written by Samuel A Donkor