Hands-on the CIFAR 10 Dataset With Transfer Learning

Published in

The Startup

8 min readSep 26, 2020

Store knowledge gained while solving one problem and applying it to a different but related problem.

Abstract

Since most of the times we don’t need to reinvent the wheel in this entry we will explore a powerful technique in machine learning called transfer learning.

Sometimes we don’t have enough data to train on, or we can afford some resources because labeled data is expensive or simply because we don’t need to develop an entire model from scratch for every single problem. This technique well implemented can solve these problems.

Introduction

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task reducing significantly the resources required to achieve good predictions

The problem with deep neural networks is the vast compute and time resources required to train a model especially if we are dealing with computer vision tasks.

As humans, we find it easy to transfer the knowledge we have learned from one domain or task to another. When we encounter a new task, we don’t have to start from scratch. Instead, we use our previous experience to learn and adapt to that new task faster and more accurately

For instance, ¿do you remember that time when you learned how to multiply? you didn’t “throw to the trash” or forget how to add, you transferred your previous experience (Adding numbers) and used it in a new task (multiplying). And since neural networks are trying to emulate the brain behavior, with transfer learning we are emulating the way that humans learn using as a base some previous knowledge.

Materials and Methods

We will follow a “framework” from the book Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow (we don’t need to learn from scratch how to do transfer learning if we use transfer learning to learn how to do transfer learning)

Look at the big picture.
Get the data.
Discover and visualize the data to gain insights.
Prepare the data for Machine Learning algorithms.
Select a model and train it.
Fine-tune your model.
Present your solution (Results).

Look at the big picture.

The first question to ask yourself is what exactly the objective is.

We are going to solve the CIFAR 10 classification problem, our task this time is to classify some pictures. Our model should learn from these pictures and be able to predict the category to which it belongs.

The objective: Get more than 90% of accuracy while maintaining a good balance with the computational cost.

Knowing the objective is important because it will determine how you frame the problem, which algorithms you will select, which performance measure you will use to evaluate your model, and how much effort you will spend tweaking it.

Get the data

We will be working with the CIFAR 10 data set, we can find this dataset inside the keras API so one way to get it is:

from tensorflow import keras as K(x_train, y_train),(x_test, y_test) = K.datasets.cifar10.load_data()

Discover and visualize the data to gain insights.

The CIFAR-10 dataset consists of 60000 32x32 color (32, 32, 3) images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. “Automobile” includes sedans, SUVs, things of that sort. “Truck” includes only big trucks. Neither includes pickup trucks.

In conclusion, we will get 60000 images and 60000 labels.

Prepare the data for Machine Learning algorithms.

We will get 6000 labels for each image if those labels are strings we need to encode those labels to a one-hot vector with a one-hot encoding method before we feed the model with the data.

¿What is one-hot encoding?

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

One hot encoding creates new (binary) columns, indicating the presence of each possible value from the original data. Let’s work through an example.

¿How?

There is a lot of methods but we are using keras.

Y_train_p = K.utils.to_categorical(Y_train, 10)
Y_test_p = K.utils.to_categorical(Y_test, 10)

Y_train and Y_test = labels from the data set.
10 = The number of categories we have.
We have our data set split into y_train and y_test, so we need to encode both.

From here starts our work with transfer learning.

We haven’t picked our base model yet but since almost all the base models in keras were trained on the ImageNet data set, we need to pre-process our data to feed the model.

There are a few methods to normalize the data and since we will use a base model, we need to normalize our data in the same way it was normalized in the first place by the people who created the base model.

Lets create a generic function for this

def preprocess_data(X, Y):"""Pre-processes the data for the model :param X: numpy.ndarray of shape (m, 32, 32, 3) containing the CIFAR 10 data, where m is the number of data points :param Y: numpy.ndarray of shape (m,) containing the CIFAR 10 labels for X :returns: X_p, Y_p"""X_p = K.applications.densenet.preprocess_input(X)# encode to one-hotY_p = K.utils.to_categorical(Y, 10)return X_p, Y_p

I’m going to use DenseNet121 as my base model, but each base model has it’s own pre_proccess_input method, you can find this method in the keras documentation, for instance, lets say we choose ResNet as our base model.

X_p = K.applications.resnet.preprocess_input(X)

Or lets say we choose Inception_V3

X_p = K.applications.inception_v3.preprocess_input(X)

Another issue is the resolution of the images, we are using 32x32 but the base model was trained with 224x224 images so we need to upscale our image (you can find the range for the resolution in the keras documentation).

We can achieve this by adding a lambda layer at the top of the base model, yes now we are tuning the model.

¿What happens when you zoom a picture? we get distortion so we will be resizing the image and padding it to avoid the distortion.

Select a model and train it

Before you pick a model I recommend you to search for a benchmark, also in the keras documentation, you will find some interesting data.

Remember our objective, we want a good balance between computational cost and quality, sometimes less is more and after training some combinations I found that I was getting better results with the DenseNet121 than with the ResNet101, even if it has more parameters!

Now load the base model

Include_top = False.

This model was created with the ImageNet data set with 1.000 categories in mind so the final layer is a Dense layer with 1.000 nodes, for our data set we just need a Dense layer with 10 nodes, so we can’t use the top layers from the original model.

Our top layers

Now we have the upscale layer at the beginning, the base model, and our top layers.

Fine-tune your model.

There is a lot of parameters we can tune to achieve better results, fo the upsampling (we can go with anything from 32x32) keep in mind, more resolution, more computational cost. We can add or remove layers from the top, we can change how many nodes we want, or the dropout probability, we can use another activation function. In this case, I’m using Adam as my optimizer but you can try with RMSprop.

We can Freeze all the layers so we just train the layers at the top, or we can freeze just a couple of layers.

base_model.trainable = Falseorfor layer in base_model.layers[:100]:
 layer.trainable = False

Present your solution (Results)

I tested Inception_V3, ResNet50 DenseNet121 with several combinations of hyperparameters. The best Quality / Cost result was achieved upsampling the pixels to 160x160 and training all the layers.