fast.ai DL course 2018 Part 1: My Learning

Hello friends,

Here I document my learnings from fast.ai DL course, Part 1. It also contains my understandings from other sources online as well.

I am new to Python, ML and DL. So I am going to look deep and question most of the ‘common’ things and concepts. If you have a background like I have, then I am sure that this will be useful to you.

So let us get started.

Concept: Use of magic functions

When you see Jupyter notebooks provided in the fast.ai courses, you will almost always see thee three lines during the beginning of most of the notebooks.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

These functions are provided by IPython which is the core technology behind Jupyter notebooks.

These statements which start with a ‘%’ are called as line magic-functions in IPython. These are Jupyter notebook specific magic functions which aim to improve the Jupyter programming environment. Below is what they mean.

%reload_ext autoreload #Whenever a Python module is encountered in the code, its latest version is automatically reloaded. No explicit ‘reload’ command required.
%autoreload 2 #Reload all modules (except those excluded by %aimport) every time before executing the Python code typed.
%matplotlib inline #Setup python matplotlib library to work interactively. ‘inline’ ensures that output is shown immediately within the Jupyter notebook

get_cv_idxs

First let us explore get_cv_idxs(n). This stands for get cross-validation indexes. If you supply a number n to it, it gives an array of (length= 20*n) numbers between 1 and n.

It is just a helper function. get_cv_idxs does not fetch the validation data at all. Output of this function is an array of indexes. These indexes are passed while fetching data from the CSV, so that those indexes are kept aside and not used for training. (rather used for validating).

‘weights’, how heavy are they?

As a neural net is getting trained, it means that it is polishing the weights of various connections between the neurons so as to be more accurate. In this case, we are using an already trained model. Model name is specified as arch = resnext101_6 4 at the beginning of the notebook. So we need to download the corresponding weights for this model so that we can make use of earlier training done on this model.

When I download the weights, I see that there are additional weights as well in that folder. Name of the files corresponds to the model names. Maybe we will use these weights in future exercises.

I see that resnext_101_64 weight has one of the highest file sizes. (326 MB, very heavy!!). So definitely these are not the models to be used if you plan to train offline on a mobile device.

So let us experiment with the other models. Let us try with the model resnext_100, for which we seem to have weights in the above folder. It worked, though I got almost the same accuracy.

Convlearner.pretrained . Setting up the stage

Convlearner.pretrained takes in the model name (arch), data, and precompute. This function is setting up the stage, i.e setting the neural network layers with weights and data. If precompute=true, it will calculate activations of all the neural network layers’ neurons (except the last layer, which is specific to current data). So this will take time first time you do Convlearner.pretrained.

When precompute=True, data augmentations will not be considered. This is because we are telling the model to use precomputed activations.

What is this 141?

Remember we started with 10,222 images? Then we remove 20% i.e 2044 and are left with 8178 images for training. We have mentioned that we have a batch size of 58 as mentioned at top of the notebook. Batch size basically means that after every 58 images, trainer validates the prediction, recalculates the learning rate and adjusts weights which will be used to calculate loss for the next 58 images. So 8178/58 = 141.

learn.summary

learn.summary gives details of the CNN we are working on.

Each layer is not a convolution layer here. Some observations are here.

  • There are 321 layers in all
  • There is a pattern here. Conv layer -> BathNorm layer -> ReLU layer.
  • There is a MaxPool layer only in the beginning.
  • There are Adaptive, Dropout, Flatten and Linear layers towards the end.
  • SoftMax is the last layer

Difference between a Convolution Layer and a Dense (fully connected) layer

????????

epoch, cycle_len, cycle_mult

epoch:

cycle_length:

cycle_mult:

Training loss and Validation Loss

Kernel size

In deep learning, kernel size is always 3 * 3. Why is that?

Real world problems

Cats & Dogs, Dog Breeds are not the real world problems. Planet challenge is the first real world challenge we are looking into. (in week 3)

Image Pixels

ImageNet models are usually trained on 224*224 or 299*299 pictures. So training them on smaller images would destroy the previously calculated weights. Inference: Models are susceptible/specific to pixel size.

Resizing

We saw data.resize command in lesson2-image_models.ipynb. This basically reduces the image size for reduced processing time. what are we really doing here??

Can higher resolutions help?

In the planet challenge, we see that we start with 64*64 and at last train with 299*299 images. Does it mean that images with a higher resolution of 299*299 are useless?

Metrics

When you submit results to a kaggle competition, there needs to be some standard way to determine how close is your solution to the actual results. This may not be straightforward. You may not be interested just in how many you got it right, but it might be also important to know how many you labelled wrong and from how far.

For example: Think about a medical diagnosis challenge, and you should be very careful not to wrongly label even one patient as having a stigmatized disease. But in some other cases like predicting the likelihood of breakdown of a machinery, a wrong prediction is not that catastrophic.

So each kaggle competition mentions a ‘metrics’ with which your results are evaluated. ‘F2’ is one among them.

Such a metric function should take two vectors, one is for predictions, another is for targets, and spits out a number, which is the metric score. This is the score that will be visible in the kaggle Leader Boards.

Metric Vs Loss Function

If our goal is to maximize the Metric, why can’t we use metric (distance from ideal metric) as our loss function???

Learning rates after unfreezing

In Dogs Vs Cats, earlier layers had 1000 times lesser learning rates than the Dense layer. But in case of Planets, earlier layers have just 9 times lesser learning rates than the Dense layer. This is basically because ImageNet pre-trained weights are not as suitable to Planet dataset as they are suitable to Dogs Vs Cats dataset. So how did you decide numbers 1000 and 9?

TTA- Which transformations?

When we do TTA, what transformations are applied to the test data set?. The answer is that same transformation that we set while creating the ‘learn’ object. i.e. tfms.

TTA — Why it returns two arrays?

multi_preds, y = learn.TTA() does TTA on the validation set. The validation set is the 20% of the training data which we set aside initially using get_cv_idxs. So this returns an array of predictions, as well as the target. (array ‘y’).

When you do learn.TTA(is_test=True), TTA is done on the test set. This is the set which kaggle has challenged us to predict. So we do not know ‘y’ (target) for this. So in this case y will be a null array.

Difference between TTA and Data Augmentation(tfms)

  1. TTA is done on validation and test set, whereas Augmentation is done while training (by passing tfms).
  2. When data augmentation is done, it is considered as additional images for training. But in the case of TTA, predictions are averaged.
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade