Understanding FastAI v2 Training with a Computer Vision Example- Part 1: The Resnet Model

Rakesh Sukumar
Analytics Vidhya
Published in
7 min readOct 20, 2020
Image Source: https://wallpapercave.com/matrix-hd-wallpaper

In this series we will use a computer vision example to study the FastAI training loop. This series is aimed at those who are already familiar with FastAI and want to dig a little deeper and understand what is happening behind the scene. I will assume basic understanding of deep learning and CNNs and will not be providing explanation for all the jargons used. If you need more explanation on any of the topics I recommend the free online course from FastAI Practical Deep Learning for Coders and the associated book.

The overall structure of this series is as below:

  1. Study the resnet34 model architecture and build it using plain Python & PyTorch.
  2. Deep dive into FastAI optimizers & implement a NAdam optimizer.
  3. Study FastAI Learner and Callbacks & implement a learning rate finder (lr_find method) with callbacks.

This is the first article in the series, in this article we will study the fastai’s resnet34 model architecture and build it on our own in PyTorch. We will use the Imagenette data (provided by FastAI), which is a subset of the famous Imagenet dataset containing just 10 classes which are relatively easy to distinguish. You can read more about the dataset here.

First, we will use fastai convenience functions to quickly create and fit an optimized Resnet34 model and use this to study the model architecture. Armed with this understanding, we will build the model architecture from scratch using plain Python & Pytorch. We will use Google Colab to run our codes. You can find the code file for this series here.

Let’s get started..!

Run the following code to install fastai v2 in Google Colab. You will need to provide authorization to your Google Drive account when you run the first cell below. See the instruction to run fastai v2 software in Google Colab here if you need more information. Remember to enable GPU for your Google Colab session before you run any code.

untar_data() is a fastai convenience function to download the data. We will be using a lot of fastai function in this series, you use the doc() function to get help on these functions as below.

Setting up dataloaders using DataBlock API

We will use fastai DataBlock API to create dataloaders for our model. We will not delve deeper into the DataBlock API here as FastAI already has some fantastic tutorials on them. You can find the tutorials here.

Let’s create a label dictionary to provide English labels to our dataset and use it in our dataloaders.

We use Imagenet stats to normalize our dataset. The aug_tranforms() function is used to apply data augmentation in our dataloaders. Let’s take a quick look at what these are:

You can get a trace of all step in the creation of dataloaders using the summary() method. This method is extremely useful if you run into errors on dataloaders creation.

FastAI transformation pipeline enables us to easily visualize our dataset after applying the augmentation transformations. Let’s take one sample form our dataset and generate multiple copies of it after applying the augmentation.

You can also use the show_batch() method to get different images from the training set and visualize them after the augmentation.

Let’s get one batch of data from our training set & check the shape. FastAI uses a default batchsize of 64. The dataloaders converts the input image to tensor of size [3, 224, 224] pixels for our neural network model to use.

Let’s use fastai’s improved resnet34 architecture to build our image classification model.

Now, create a Learner object to train our model. Learner is the basic class for handling the training loop in FastAI. A learner object binds together the dataloaders, the neural network model, an optimizer, and the loss function. FastAI adds an Adam optimizer by defaults & can choose an appropriate loss function based on the type of our target variable. For a categorization problem, it adds CrossEntropyLoss() as the default loss function.

Let’s use Learner method learn.lr_find() to find an appropriate learning rate and learn.fit_one_cycle() to fit the model using one-cycle learning for 10 epochs. We are training the model for a relatively high number of epochs here as we are not using transfer learning.

Wow!!! We got an accuracy of 81.2% in 10 epochs.

We can use the learn.summary() method of the learner object to understand more about our model. The summary() method also displays the shape of the activations from each layer of the model. Note that we are just showing the
top and bottom part of the output from the summary method below.

xResnet34 Model Architecture

Let understand the model architecture used. xResnet34 architecture consists of 5 stages of convolution layers: an input stem + 4 stage of resnet blocks (see the image below). Each Resnet block consists of 2 conv layers with a short circuit connection (identity path) from the input of the first conv layer to the output of the second conv layer. The original resnet34 architecture described in the paper “Deep Residual Learning for Image Recognition” had one 7x7 convolution layer in it’s input stem and hence had 34 neural network layers in it’s main path (1 conv layer in input stem + (2 conv layers/resnet block)*16 resnet blocks + 1 linear layer). FastAI’s xresnet34 model has 3 conv layers in it’s input stem. Deeper resnet architectures are built by adding more conv layers in each resnet block. I strongly recommend reading “Deep Residual Learning for Image Recognition” and “Bag of Tricks for Image Classification with Convolutional Neural Networks” for a better understanding of the resnet architecture & it’s variations.

Resnet Architecture

The complete architecture for the xresnet34 model is as below:

xResnet34 Architecture. *BN: BatchNorm layer

“nn.Sequential([])” in the id path above indicates that no neural network layers are present in that path. The xresnet34 architecture has 21.3 million trainable parameters in 116 parameter tensors (nn.Parameter tensors representing weights and bias) as given below.

No. of parameter tensors in xresnet34 architecture

The excel file containing the architecture with formulas to compute the number of parameters can be found here (File name: Understanding xResNet.xlsx).

Let’s create our own xresnet34 model now. First, let’s create the input stem.

Next, we will create a resnet block and use it to create the resnet stages. The structure of the resnet block is as below. An AveragePooling layer with a stride 2, and a Conv2d layer with kernel size 1x1 and a BatchNorm layer are added to the identity path if the input and output shapes to the resnet block differs.

Resnet block in xresnet34
Resnet block in xresnet34

Let’s build the resnet stages and the complete xresnet34 model next.

resnet stages
xresnet34 model

Let’s move the model to GPU and initialize the weights & biases using kaiming initialization.

We will now create a learner using our own neural network architecture and use it to fit the classification model as before.

--

--