TensorFlow 2.1 with TPU in Practice

Case Study: Google QUEST Q&A Labeling Competition

Ceshine Lee
Veritable
3 min readFeb 17, 2020

--

(This is the first half of this article on my personal blog.)

Executive Summary

  1. TensorFlow has become much easier to use: As an experience PyTorch developer who only knows a bit of TensorFlow 1.x, I was able to pick up TensorFlow 2.x in my spare time in 60 days and do competitive machine learning.
  2. TPU has never been more accessible: The new interface to TPU in TensorFlow 2.1 works right out of the box in most cases and greatly reduces the development time required to make a model TPU-compatible. Using TPU drastically increases the iteration speed of experiments.
  3. We present a case study of solving a Q&A labeling problem by fine-tuning the RoBERTa-base model from huggingface/transformer library:

(TensorFlow 2.1 and TPU are also a very good fit for CV applications. A case study of solving an image classification problem will be published in about a month.)

Acknowledgment

I was granted free access to Cloud TPUs for 60 days via TensorFlow Research Cloud. It was for the TensorFlow 2.0 Question Answering competition. I chose to do this simpler Google QUEST Q&A Labeling competition first but unfortunately couldn’t find enough time to go back and do the original one (sorry!).

I was also granted $300 credits for the TensorFlow 2.0 Question Answering competition and had used those to develop a PyTorch baseline. They also covered the costs of Cloud Compute VM and Cloud Storage used to train models on TPU.

Introduction

Google was handing out free TPU access to competitors in the TensorFlow 2.0 Question Answering competition, as an incentive for them to try out the newly added TPU support in TensorFlow 2.1 (then RC). Because the preemptible GPUs on GCP are barely usable at the time, I decided to give it a shot. It all began with this tweet:

Turns out that the TensorFlow model in huggingface/transformers library can work with TPU without modification! I then proceeded to develop models using TensorFlow(TF) 2.1 for a simpler competition Google QUEST Q&A Labeling.

I missed the post-processing trick in the QUEST competition because I spent most of my limited time wrestling with TF and TPU. After applying the post-processing trick, my final model would be somewhat competitive at around 65th place (silver medal) on the final leaderboard. The total training time of my 5-fold models using TPUv2 on Colab was about an hour. This is a satisfactory result in my opinion, given the time constraint.

TensorFlow 2.x

The TensorFlow 2.x has become much more approachable, and the customizable training loops provide a swath of opportunities to do creative things. I think I’ll be able to re-implement top solutions of the competition in TensorFlow without banging my head on the door (at least less frequently).

On the other hand, TF 2.x is still not as intuitive as PyTorch. Documentation and community support still have much to be desired. Many of the search results still point to TF 1.x solutions that do not apply to TF 2.x.

As an example, I ran into this problem in which the CuDNN failed to initialize:

One of the solutions is to limit the GPU memory usage, and here’s a confusingly long thread on how to do so:

(I know that PyTorch has its own TPU support now, but it is still quite hard to use last time I checked, and it is not supported in Google Colab. Maybe I’ll take another look in the next few weeks.)

Case Study and Code Snippets

This section will briefly describe my solution to the QUEST Q&A Labeling competition, and discuss some parts of the code that I think are most helpful for those to come from PyTorch as I did. This section assumes that you already have a basic understanding of TensorFlow 2.x. If you’re not sure, please refer to the official tutorial Effective TensorFlow 2.

Source Code

Roadmap

  1. TF-Helper-Bot: this is a simple high-level wrapper of TensorFlow I wrote to improve code reusability.
  2. Input Formulation and TFRecords Preparation.
  3. TPU-compatible Data Loading.
  4. The Siamese Encoder Network.

There’s more! Read the full post on my personal blog.

--

--

Ceshine Lee
Veritable

Data Geek. Maker. Researcher. Twitter: @ceshine_en