Tutorial 8: Comparing Different Data Augmentations with Oxford Pets

Published in

Fenwicks

4 min readMay 9, 2019

Prerequisite: Tutorial 4 (Dogs vs. Cats).

Oxford-IIIT Pets is a dataset similar to Dogs vs. Cats in Tutorial 4, except that the Pets data also tells you what breeds the dogs and cats are.

So naturally, we handle it the same way as in Tutorial 4. There are more than 30 breeds of pets in the dataset, and each breed has about 170 images. So it is much harder compared to Dogs vs. Cats, which contains tens of thousands of images in each class.

There is an annoying issue with the Pets dataset: that not all images are standard color photos with three channels: red, green and blue. Some are grayscale, and some have an additional “alpha” channel for transparency. These will cause problems for Tensorflow. So, we need to check image types and fix these issues:

data_dir_local = fw.datasets.untar_data(fw.datasets.URLs.PETS,
  os.path.join('.', PROJECT))
data_dir_local = os.path.join(data_dir_local, 'oxford-iiit
  -pet/images')fw.preprocess.check_rgb(data_dir_local)

There’s another issue with the inputs: that they are not stored in a nice directory structure as in Tutorial 4, in which all dog images are in a “dogs” sub-directory, and all cats are in a “cats” folder. Here, all pet images, regardless of their breeds, are kept in a single “images” folder. The way to tell the label of a training image is through its file name. For example, “Abyssinian_1.jpg” is a photo of an Abyssinian cat. In order to extract the label from the file names, we use a regular expression:

pat = r'/([^/]+)_\d+.jpg$'

Regular expressions are commonly used as job interview questions, which is one reason every Python developer needs to master this topic. In the above pattern, the letter r before the string indicates that this is a regular expression. Then, / means the first character must be a slash, which separate directory with sub-directories and files on Linux, the OS of Google Colab. After that, ([^/]+) means a sequence of letters that are not /, and \d+ means a sequence of digits. For example, in Abyssinian_1.jpg, ([^/]+) matches “Abyssinian” (the label), and \d+ matches “1”.

With this pattern, we convert the dataset into TFRecord’s with the one-liner:

paths_train, paths_test, y_train, y_test, labels = 
  fw.data.data_dir_re_tfrecord_split(data_dir_local, pat, train_fn,
  valid_fn)

By default, the train/validation split is 80/20, as in Scikit-Learn. After setting up the input TFRecord files, we do exactly the same as Tutorial 4, using transfer learning from a ResNet50 model pre-trained on ImageNet. In the end, we get an accuracy of about 88% after 10 epochs, which is rather low. Let’s check the input data:

reverse_normalizer = 
  fw.transform.REVERSE_IMAGENET_NORMALIZE[base_model.normalizer]fw.anim.show_input_func(train_input_func, n_img=20,
  converter=reverse_normalizer)

Note that our training input pipeline has a step that normalizes images, as explained in Tutorial 4. In order to show these images on the screen, we need to cancel out the normalization by applying a reverse transform, as above. The images from the training input pipeline look like these:

As you see, the Inception transforms are really, really aggressive in cropping and zooming the images. This is probably too hard for our neural network. Let’s change to a more moderate data augmentation scheme: one similar to the default in fast.ai.

train_input_func = get_input_func(train_fn, True,
  fw.transform.get_fastai_transforms)
valid_input_func = get_input_func(valid_fn, False,
  fw.transform.get_fastai_transforms)

These are the images from the training input pipeline: