Faster AI: Lesson 2 — TL;DR version of Fast.ai Part 1

Kshitiz Rimal
Deep Learning Journal

--

In my previous post Lesson 1, I talked briefly about how to setup software and hardware environment for this course and went through fine tuning in brief. If you haven’t read that, please go through it before reading this post.

In this lesson, we are going little deeper with these concepts and implement a neural network from scratch while explaining important aspects of it.

To keep things short and to explain them properly, I have divided this lesson into 4 parts:

  1. Dogs vs Cats Redux and submitting to Kaggle [Time: 17:14]
  2. Details on CNN features and Fine Tuning [Time: 54:09]
  3. Explaining Neural Network in Excel [Time: 1:03:30]
  4. A Linear Model from scratch and how to use that to classify cats and dogs [Time: 1:14:08]

1. Dogs vs Cats Redux and submitting to Kaggle

In my previous post I talked about Dogs vs Cats competition and how to use that data and use a process called fine tuning on pre-trained VGG16 model to classify between dogs and cats images.

Dogs vs Cats Redux is essentially the same with an improvement to utilize latest ‘Kernel’ features of Kaggle. One major difference here is the structure of data provided by the competition and the way Keras handles data to classify between images. Basically you move Cats images inside ‘cats’ folders of ‘train’ and ‘valid’ folders and Dogs images to ‘dogs’ folder, in a same manner.

After that you do the same thing for ‘sample’ folder as well where there are subsets of cats and dogs images to quickly train and test the model to ensure if its working properly. To learn how efficiently Jeremy did it inside Jupyter Notebook, watch it here or follow this notebook.

After setting up the folder structure, you can simply re-execute the same 7 lines of code from previous post to train the model and get the results of classification.

In this picture, there is one extra line

In simple terms it saves your trained model in a file and with that you can use the same model later on without any sort re-training.

After successfully training the model, you may want to submit your results to the competition. Its a very simple and straight forward process, you basically need to:

1. Install Kaggle-cli tool from pip as [pip install kaggle-cli]
2. Configure your tool for the right competition as
[kg config -g -u `username` -p `password` -c `competition`]
3. After your submission is ready submit you csv file as
[
kg submit `entry` -u `username` -p `password` -c `competition` -m `message`]

Preparing your submission

You will need to create a CSV file in order to submit to Kaggle. This CSV file includes file names of you test data without the extension as ‘id’ and probability of each of that file as ‘label’. The final structure looks like this

You can get the prediction on each of these test images by this command

As these predictions are of 0 or 1, you will need to convert it to fractional value of 0.5, 0.95 or something like that, like above . For that you can use Numpy’s Clip function with values between 0.02 and 0.98 for better results.

It is very important to know how Kaggle evaluates its submissions. While explaining how predictions are made in this VGG model, Jeremy goes on to explain a particular type of loss function called Log Loss or cross entropy loss and its importance on calculating loss and generating predictions. It turns out, its the same loss function, the competition uses to evaluate the results as well. So, before our values were like 1s and 0s for prediction and as we are taking log of 0s and 1s here, the result is not what we desire. So its always a good idea to convert your prediction if its zero to 0.02 and if its 1 then to 0.98 or around that to get proper evaluation results.

After getting the predictions, you will need to prepare the CSV file with proper structure of one column as filenames and another column as probabilities. for that you can follow this code snippet used by Jeremy in this lesson

Here, he is basically getting the probabilities and file names from the batches and stacking them on a file called subm98.csv. You can watch and follow the whole process here.

To get better results at these trainings, he recommends to fit them more than once and with slightly different learning rates.

You can even visualize and see how your model is performing by looking at the results of how many it got right and how many of it got wrong. You follow these 5 steps after training the model, to check your model performance

Some of the results from these tests are as:

Correct Labels
Incorrect Labels

2. Details on CNN features and Fine Tuning

It is important to know why we should do fine tuning but before that lets look at what these layers of Convolutional Neural Network, like VGG actually learns when feed images to it.

One amazing paper by Matthew D Zeiler, Rob Fergus explained this in much depth. Lets try to understand what they mean.

From their research it is found that the very first layer of a CNN tends to learn primitive features of an Image, like lines. Lets look at this image

On our left hand side is a set of filters on first layer of a CNN, by looking at it, we can see it recognizes some primitive forms of images like diagonal lines and some solid colors.

When these filters from layer 1 with some more filters continued to next layer 2, they learn little bit more of that same image like this

We can see, this layer recognizes shapes like, window or door shapes from an image.

If this process is continued further we get results like this

At layer 4 and 5 we can see it is recognizing more complex forms like face of a dog and eye of a bird.

There are 2 reasons why this visualization is so important.

  1. Now we have intuitive understanding of the model and what they might learn at each layer
  2. For fine tuning, we need to decide what layer should we remove and what layers should we keep.

In my previous post, I didn’t cover what fine tuning does under the hood. Let me go through that quickly.

Generally in fine tuning, we remove the last Dense (Fully connected) layer and replace that one with a new layer which uses our dataset. Now, this is fine as long as the pre-trained model are trained on similar kind of dataset, like ours, as imagenet has images of dogs and cats as well.

But lets say you want to classify brain tumors from CT scans. Now, should we take all the layers of VGG and fine tune like before?

One way to think of this is, if the CT scans image is totally different than Imagenet Images, which it is. You can only take learned layers of the model where it can recognize primitive shapes and structure and fine tune with that layers only. That way your model will not be unnecessarily huge and unpredictable and you can still get benefit of a pre-trained model on huge dataset than yours.

For such applications and intuitions, visualizations like this is very important.

3. Explaining Neural Network in Excel

To explain things more simply, Jeremy teaches the functioning of a neural network by using Microsoft Excel. Which is very unusual, but surprisingly very intuitive and effective.

You can download the excel files from this link and go through it, but let me put it briefly, what he does here.

One way to look at neural network is like a tabular layers of integers which are stacked on top of another.

Suppose you have input as integers and for a neural networks you need weight values which need to get multiplied with these input integers. Lets visualize this tabular layers like this in excel.

Here, X1, X2, X3 are input values and Y1 and Y2 are output values and randomly generated weights are listed under weights.

Here these weights values are randomly generated, which is what a neural network does in real as well, it randomly initializes the weight values and when these weights at each layer gets multiplied with input of that layer, activations at each layer is what we get.

But it is very important to consider right kind of initialization for these weight values because if they are not at scale of our input values, the predicted output will be far off our real output and loss will be high, which means the network is not performing properly. For that Xavier Initialization is used, which creates the initial weight values with same scale of the input values.

Every now and then Jeremy uses this explanation method using excel to teach us these concepts behind neural networks, which I think is totally awesome and you should definitely check them out in video.

4. A Linear Model from scratch and how to use that to classify cats and dogs

You can get the code for this in Lesson 2.ipynb file.

We need to understand how a linear model simple as a model of a straight line (Y = AX+B) works, in order to understand more sophisticated deep neural networks. And after knowing that at intuitive higher level, its not so different at all.

In Lesson 0, I briefly talked about about such linear model. In this lesson we are going to construct such in Keras.

Its a very straight forward process in keras for such model.

You randomly initialize input values of 2 dimensions of 30 entries in a vector form. Then to create Y values, you dot multiply it with [2,3] which is a vector and in our case, one of the weights. Then add the resultant with 1, which is another weight value.

In such manner you create 30 entries of x, the input and 30 entries of y the output. Now the job of the model is to figure out the right weight values given these Xs and Ys only. That is, it needs to find values of [2,3] vector and [1] given to create these Y values.

You can initialize any such linear layer using Dense method in Keras. And for a neural network to use such layers, you need to wrap around these layers by Sequential Method, which basically means, this is a neural model and here Dense is one of the layers on it, you can provide many type of layers in a array form, in this Sequential Method.

And then you compile that model with optimizer, which basically corrects the values of randomly initialized weights and make it close to real one at each iteration. Here loss is Mean Squared loss which measures euclidean distance between real output and predicted output. Comparing this loss, the weight values are updated using Gradient Descent, which I will cover in following lessons.

Then you train the model, which is by invoking the fit method

After that, when you look at the learned weight values, it should be close to our, [2,3] and 1 values.

Using this same type of Dense(Fully Connected/Linear Model) we can also use to classify our existing cats and dogs dataset.

For that what we need to do is, get the predictions of each of our images in train and validation set from our original pre-trained VGG model, which will give predictions between 1000 image classes, and use that prediction of 1000 classes as input to our new Linear model with input as 1000 dimension and output as 2 for our cats and dogs.

To get the predictions, lets use model.predict on our pre-trained model

Then lets define our linear model to use these predictions as input

And lets fit the model

Now if we look at our model summary, it shows the linear model we are using this model should predict the same way our previous fine tuned model on VGG.

One thing that separates Neural Networks from just linear model is the activation functions. It is present in every neural network and is used to activate the neurons on each layer of neural network. What it does is, it passes the linear values of any layer of neural network through some non linear functions, such as ReLU, TanH are among many. These non linearities is what makes the neural network to solve any kind problem approximately.

This fantastic article called A visual proof that neural nets can compute any function explains it in more detail.

Now lets do some Fine tuning without any abstractions and see what it really means to do fine tuning on a pre-trained model.

Lets take our original pre-trained VGG model and see the layers of it

At the end, like our previous model there is a dense layers which outputs the probabilities of 1000 classes, as we only need to predict between 2 classes, cats and dogs, lets remove it and lets set the other layers of the model to untrainable, because we want to utilize what it has learned and add on top of it, not influence it with our dataset.

Now, lets add our new Dense layer which will output only two classes.

Now lets setup our optimizers and fit the model with our cat and dog dataset.

Thats it, this is what under the hood, fine tuning is. In my previous posts, I have showed one overlay function created by Jeremy to quickly fine tune the model without showing its inner process.

Using this same technique and with new datasets you can train model to classify any kind of images. If you follow the Notebook file, you can also learn to fine tune more than one layer in a model.

For more details, you can always watch the full video or follow along the class notes. You can also view the timeline of this lesson and jump to any particular topic you want to watch.

In the next lesson, I will talk about Stochastic Gradient Descent, Max Pooling, fitting the model and many more.

See you there.

Next Post: Lesson 3

--

--

Kshitiz Rimal
Deep Learning Journal

AI Developer, Google Developers Expert (GDE) on ML, Intel AI Student Ambassador, Co-founder @ AI for Development: ainepal.org, City AI Ambassador: Kathmandu