Report of using TPU Pod to pre-train a Japanese NLP model

Published in

green-bamboo

4 min readOct 28, 2019

I’ve pre-trained an XLNet Japanese model of the maximum sequence length 512 on a preemptible Cloud TPU v3–256 Pod kindly provided by Google TFRC (for free!), with the Japanese Wikipedia dataset. Although the training was still premature when I had used up all my TPU credits, I finetuned the model to classify Livedoor Japanese news. I got inspiration from the contribution of Yohei Kikuta in which he has pre-trained and shared a BERT Japanese model.

A single Cloud TPU v3–8 device has 128 GB memory, while a v3–256 Pod has 4 TB memory, which is BIG, and I could not prepare such an environment at my disposal without the help of TFRC.

Actually, before the TPU Pod, TFRC provided me up to 110 single Cloud TPU devices for one free month (most of them were preemptible).

E-mail from TFRC giving access to 110 TPU devices

At first, I thought of combining these TPU devices to make them collaborate on training my Japanese language model. Later I realized that it was not supposed to work that way.

Single device TPUs, which are individual TPU devices that are not connected to each other over a dedicated high-speed network. You cannot combine multiple single device TPU…

Report of using TPU Pod to pre-train a Japanese NLP model

Written by Vo Chi Cong