Keras for TPUs on Google Colaboratory (Free!)

Playing with the official Fashion MNIST example

Google has started to give users access to TPU on Google Colaboratory (Colab) for FREE! Google Colab already provides free GPU access (1 K80 core) to everyone, and TPU is 10x more expensive. (Google Cloud currently charges $4.50 USD per TPU per hour, and $0.45 USD per K80 core per hour.) What an exciting news.

The good folks from fast.ai community already shared some benchmarks of MNIST dataset. However, it seems MNIST is too simple to solve that we cannot actually see the advantages of TPU.

I found an official example notebook “Fashion MNIST with Keras and TPUs” in the Github repo tensorflow/tpu. I’ve made some changes to the notebook and run it three times in different environments (TPU, GPU, CPU) as an alternative benchmark. Here are what I’ve changed:

  1. Created a validation set from the training set
  2. Change batch_size from 1024 to 512 and epochs from 10 to 20. (No particular reason. Just want to try out different hyper-parameters.)
  3. Calculate test and validation scores post-training.

Source Code / Notebooks

And here are the notebooks (They are saved as Github Gists. Use dropdown menu “File/View on Github” to open them on Github.):

Technical Details

All it takes is really just converting the Keras model to TPU model using tf.contrib.tpu.keras_to_tpu_model :

tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(
tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
)
)

According to this notebook, this is just a temporary solution. In the future you’ll have to choose TPU as a distribution strategy in the model.compile instance method call instead.

Results

With 3 convolution layers and 2 fully-connected layers, we can see that TPU already provides almost 2x performance in terms of speed comparing to GPU:

Source Code

The train/validation/test accuracies should be very close across different environments, since they share the same hyper-parameters (we did not set the same seed, though):

One interesting thing is that TPU post-training validation accuracy is different from what Keras reported during training. I’m not sure why, but it probably has something to do with the fact that TPU uses mixed-precision computation, and we move the graph to CPU post-training, which I presume uses single-precision computation.

Conclusion

This is a very quick peek at what we can do with TPU on Google Colab. For the next step, we can try training bigger models with bigger datasets to fully utilize the power of TPU. By giving free access to TPU, Google Colab certainly opened a whole new world for us to explore. Good luck!

Additional Materials

We briefly introduced some TPU reviews and benchmarks in April: