Tutorial 4: 99+% accuracy on Dogs vs. Cats in One Epoch

David Yang
Fenwicks
Published in
5 min readMar 23, 2019

Prerequisite: Tutorial 0 (setting up Google Colab, TPU runtime, and Cloud Storage)

Transfer learning is a must-have skill for deep learning practitioners. A classic dataset for demonstrating the effectiveness of transfer learning is Kaggle’s Dogs vs. Cats competition, in which each data record is either a cat image or a dog:

Sample images from Kaggle’s Dogs vs. Cats competition. © Petfinder.com

Keras’ solution. The official Keras blog includes an old tutorial on Dogs vs. Cats. The way they did it, however, is quite complicated. On a high level, their tutorial has two main steps.

  1. Train a simple 2-layer network on top of a frozen VGG16 network pre-trained on ImageNet. This leads to around 90% validation accuracy.
  2. Unfreeze the last 5 convolution layers of the pre-trained VGG16, and retrain them together when the 2 new layers from Step 1. This gives 94% accuracy.

They also have a rather confusing implementation of Step 1. Normally, when we do transfer learning, we change the head of a pre-trained level. Here, the “head” of a ConvNet is the final block of fully connected layers. In the original VGG16 model, the head outputs a prediction with 1001 classes. For our problem, we have only two classes: dogs and cats, so we change the head to a binary classifier.

In the Keras tutorial, Step 1 is much more brutal: they cut off the VGG16’s head without giving it a new head. Then, they let the headless VGG16 perform “predictions” on the Dogs vs. Cats dataset. Since the model has no head, it can’t output any label; instead, some stuff comes from its neck, which is supposed to feed the head block as inputs. In the Keras tutorial, they collect this “neck blood” (which they call bottleneck features), treat it as a new dataset, and train a shallow model with 2 layers.

Our solution. In this tutorial, we follow a much simpler approach — that we directly train the whole model with the new binary-classifier head (that is, it predicts the probabilities of “dog” and “cat”), and we freeze the base model throughout training. As we’ll see soon, this leads to 99% accuracy with a single epoch of training.

Also, we won’t use VGG16 as in the Keras blog, which is old (2014) and smallish (16 layers). We are in 2019 and we have a much bigger gun: InceptionResNetV2. Look at this BFN:

Architecture of Inception-Resnet-V2 © Google

Preparing the pre-trained model. First we set up Fenwicks:

import numpy as np
import tensorflow as tf
import os
if tf.io.gfile.exists('./fenwicks'):
tf.io.gfile.rmtree('./fenwicks')
!git clone -q https://github.com/fenwickslab/fenwicks.git
import fenwicks as fwfw.colab_utils.setup_gcs()

And the options for hyperparameters:

ROOT_DIR = 'gs://gs_colab'
PROJECT = 'tutorial4'
MODEL = "InceptionResNetV2" #@param ["InceptionResNetV2", "ResNet50", "ResNet50V2", "InceptionV3", "MobileNetV2", "Xception"]

BATCH_SIZE = 128 #@param ["128", "256", "512"] {type:"raw"}
EPOCHS = 1 #@param {type:"slider", min:0, max:50, step:1}
LEARNING_RATE = 0.001 #@param ["0.001", "0.01"] {type:"raw"}
WARMUP = 0.05 #@param {type:"slider", min:0, max:0.5, step:0.05}

In this tutorial series, we always use variables data_dir, work_dir and model_dir to store data files, temporary files generated during training, and pre-trained model weights, respectively.

fw.colab_tpu.setup_gcs()
data_dir, work_dir = fw.io.get_project_dirs(ROOT_DIR, PROJECT)

Let’s now create the base model: InceptionResNetV2 pre-trained on ImageNet, provided by Keras.

base_model = fw.keras_models.get_model(MODEL, BUCKET)

Internally, Fenwicks first downloads the model weights to the local disk of the Colab machine, and then saves weights in Tensorflow’s checkpoint file format. After that, it uploads the checkpoint to model_dir on GCS. The return value base_model contains:

  • A function model_func that creates the InceptionResNet structure.
  • Path to the model weights file, weight_dir.
  • Model weight variable names, weight_vars.
  • Default image size of the base model. For InceptionResNetV2, the default image size is 299x299. For many other models such as ResNet50, the default image size is 224x224.
  • An image normalizer function. For InceptionResNetV2, the normalizer simply (i) divides each pixel by 255, (ii) subtracts by 0.5 and (iii) multiplies by 2. This is the default for most Google models. Other models such as ResNeXt use a standard scaler that subtracts by the mean and then divides by the standard deviation.

Preparing the dataset. First, we download the data files to the local disk, which come from fast.ai:

data_dir_local = fw.datasets.untar_data(fw.datasets.URLs.DVC,
os.path.join('.', PROJECT_NAME))
train_dir_local = os.path.join(data_dir_local, 'dogscats/train')
valid_dir_local = os.path.join(data_dir_local, 'dogscats/valid')

The fast.ai data files are basically the training set (25k images) from Kaggle, further split into a training set of 23k images and a 2k validation set. Then, we convert them to TFRecord’s, and upload to GCS with Fenwicks’s oneliner:

train_fn = os.path.join(data_dir, 'train.tfrec')
valid_fn = os.path.join(data_dir, 'valid.tfrec')
paths_train, y_train, labels_train =
fw.data.data_dir_tfrecord(train_dir_local, train_fn)
paths_valid, y_valid, labels_valid =
fw.data.data_dir_tfrecord(valid_dir_local, valid_fn)
n_train, n_valid = len(y_train), len(y_valid)
n_classes = len(labels_train)

Building our new ConvNet. We build our transfer learning model by adding layers one by one to Fenwicks’s Sequential model. The first layer is the base model, for which we freeze the weights by setting its trainable property to False, as in Keras. Then, we add a DenseBN layer (a fully connected layer followed by a Batch Normalization), and a classifier:

def build_nn(c=256):
base = base_model.model_func()
base.trainable = False
model = fw.Sequential()
model.add(base)
model.add(tf.keras.layers.Flatten())
model.add(fw.layers.DenseBN(c))
model.add(fw.layers.Classifier(n_classes))
return model

We’ll train our model with Adam and cosine learning rate schedule:

steps_per_epoch = n_train // BATCH_SIZE
total_steps = steps_per_epoch * EPOCHS
warmup_steps = int(total_steps * WARMUP)
cosine_decay = tf.train.cosine_decay_restarts
lr_func = fw.train.one_cycle_lr(LEARNING_RATE, total_steps, warmup_steps, cosine_decay)
fw.plt.plot_lr_func(lr_func, total_steps)
opt_func = fw.train.adam_optimizer(lr_func)
model_func = fw.train.get_clf_model_func(build_nn, opt_func)
fw.plt.plot_lr_func(lr_func, total_steps=total_steps)

Input pipeline. Our input pipeline reads from the TFRecord’s prepared earlier. We use the data transformations in Google’s Inception tutorial:

tfms = fw.transform.get_inception_transforms(h, w,
training=training, normalizer=base_model.normalizer)

The Inception transforms include random cropping, horizontal flipping, and fast color distortions. We’ll visualize these in Tutorial 7 and compare it with other transforms in Tutorial 8. Based on these transforms, we build the input parser, and then the input functions, one for training data and one for validation data.

def get_input_func(fn, training):
h = w = base_model.img_size
tfms = fw.transform.get_inception_transforms(h, w,
training=training, normalizer=base_model.normalizer)
parser = fw.data.get_tfexample_image_parser(tfms) return lambda params: fw.data.tfrecord_ds(
fn, parser, params['batch_size'], training=training)

train_input_func = get_input_func(train_fn, True)
valid_input_func = get_input_func(valid_fn, False)

Training and evaluating the model. At last, we are ready to build a TPUEstimator, and train the model:

est = fw.train.get_tpu_estimator(steps_per_epoch,  model_func,
work_dir, base_model.weight_dir, base_model.weight_vars,
trn_bs=BATCH_SIZE, val_bs = n_valid)

est.train(train_input_func, steps=total_steps)

The training itself takes little time, since we only train the model for 1 epoch. However, it takes a very, very long time to compile the giant InceptionResNetV2 network to TPU’s machine code, and load the pre-trained weights from GCS to TPU. The whole process takes around 4 minutes in total.

Let’s check the validation accuracy:

result = est.evaluate(valid_input_func, steps=1)print(f'Test results: accuracy={result["accuracy"] * 100: .2f}%,
loss={result["loss"]: .2f}.')

Finally, let’s clean up the immediate files to save space on GCS, since Google starts charging us when we use more than 5GB. Optionally, we can also clear data_dir and model_dir.

fw.io.create_clean_dir(work_dir)

Here’s the complete notebook.

All tutorials:

--

--