Faster AI: Lesson 7 — TL;DR version of Fast.ai Part 1

Published in

Deep Learning Journal

9 min readSep 14, 2017

This is Lesson 7 of a series called Faster AI. If you haven’t read Lesson 0, Lesson 1, Lesson 2, Lesson 3, Lesson 4, Lesson 5 and Lesson 6 please go through them first.

Two third of this lesson is about architectures in convolutional neural network and techniques to improve such networks and remaining part is about different types of RNNs. For the sake of simplicity, I have divided this lesson into 3 parts:

Resnet [Time: 0:2:20]
Multi Output & Heat Map [Time: 0:43:00]
Gated Recurrent Unit (GRU) [Time: 1:44:50]

1. Resnet

Resnet is another architecture in Convolutional Neural Network. This network utilizes something called resnet blocks and to picture these resnet blocks, lets look at this figure:

Here, the blue circles represents different hidden units, convolution blocks followed by activation functions. Now, if you pass your input from the bottom of that first circle, shown by blue arrow at the bottom, the input get passed on to the last circle at the top, which is the final output from all those hidden blocks.

In a Resnet block, what it does is, it merges that final output from all those hidden blocks with the first input passed to the first hidden block. It adds those two values and gives out the final result, which is the output from a Resnet block.

In a neural network there are multiple resnet blocks, as represented by red squares above, which results in a Resnet architecture.

This same process can be represented in Keras as

This code shows one such resnet block. This is implemented using functional API of Keras and everything seems to be similar to previous convolutional architectures, except the final two lines of the function

Here, as from the diagram above, it merges the final output x with input tensor as a sum result.

In addition to this resnet block, Resnet architecture also utilizes something called ‘Global Average Pooling’.

Previously, we talked about one of the pooling type called max-pooling. In max pooling, the layer will pick up only the maximum value from specified block of matrix, but in case of Average Pooling, it will calculate the average value from that specified matrix instead of picking up maximum value.

In case of Global Average Pooling, instead of average pooling from one particular layer, it will calculate average pooling from the list of layers which will include many convolutional layers and their feature maps.

At the end of all the Resnet blocks, Global Average Pooling can be done.

2. Multi Output & Heat Map

Multi output is another technique used in order better understand the results of our convolutional model and even increase the accuracy of it to some extent.

To explain this process, Jeremy uses another competition, which is a Fishery competition where the target is to properly classify between 8 different types of fishes from given test images.

In this multi output technique, in addition to the label of the images in train set, another set of data is also required. In this particular example, this data is a JSON file which has co-ordinate values to properly target fishes in any particular training picture.

This JSON is for image ‘img_04908.jpg’, where u’height’ and u’width’ is the height and width of the box or rectangle where the target fish is inside of and u’x’ and u’y’ is the pixel position of that rectangle for that image.

If we visualize this rectangle data with the image, we get this image:

As you can see the fish is properly inside the rectangle and specified where it is located in the image.

Now, each of those train images have labels for particular class of fish and a separate JSON values which has this co-ordinate position of the box similar to above.

To apply this multi output technique all we have to do is, architect our model like we usually do for transfer learning models but in addition to last dense layer, in such model, we need to add another dense layer just above that last dense layer, but without any activation functions.

Here, x_bb is that new dense layer without any activations.

Now, in Keras while compiling the model, we need two differnt loss functions, mean square loss ‘mse’ for x_bb and log loss for our usual output.

While fitting the model we need to pass two different outputs, one is the true labels for each images and another is the JSON data for each images.

After that the model will give two different output from single image.

And if we visualize the predicted box from the model for given validation image with the real box for that image. We get this

Here, yellow box is the predicted box and red one is the real one specified in its JSON file.

With this technique, the model learns from the input in more ways that usual and some how cross connects all those JSON data as well as actual images and performs better than usual.

Heat Map

Another great visualization technique is heat map. In this technique given any particular image like this

If we pass this image to our convolutional layers, we can visualize how this convolutional layer is performing by plotting it on graph. The output is like this

This is an output from one of the convolutional layers and the pink area shows the required object is in that part of the image. Which means the layer is performing well, as we know the fish is in that position of the image.

We can also overlap these two results to better visualize the performance.

In addition to these techniques, Jeremy also goes into subjects like Data Leakage, which in simple terms means that the provided data to a model is not sufficient enough to properly predict the desired outcome and goes into various ways on how it can be controlled and utilized.

He briefly then talks about another CNN architecture called Inception, which has something called inception blocks and uses convolutional layers with different filter sizes. Previously we only worked with convolutional layers with same size filters. But this architecture uses different sizes of filters in a same block.

At the end of each block, all of these different convolutional layers are then merged by concatenation, which is the output of the block.

In our previous lesson, we talked about RNNs, now in this lesson, Jeremy also implements such RNN entirely on python to better understand the model and shows us some of the techniques that can ease up the process of its creation.

3. Gated Recurrent Unit (GRU)

In our previous lesson we briefly touched upon LSTM models, but here instead of LSTM we will learn about Gated Recurrent Units.

LSTM and GRU are two famous types of RNNs. But unlike LSTMs, GRUs are simpler to understand in terms of architecture and better in performance as well.

Previously we talked about RNNs, and we looked at this figure

Now, in this RNN, this circle unit is GRU and what happens inside of it is given by figure below:

In GRUs, from one end there is input and another there is output. But what happens in between those are controlled by these ‘gates’. There are two types of gates, ‘reset gate’ represented by ‘r’ and ‘update gate’ represented by ‘z’ and as from above RNN figure we have a hidden state ‘h’.

These gates are little neural networks which outputs 0 or 1 and will eventually be multiplied by the input. By training the Network these gates will be smart enough to decide whether to give 0 or 1 as an output.

Now if the reset gate is 0, then this will allow the network to forget the hidden state and if its 1, then it will allow the network to remember the hidden state. This mini neural network called gates will have two inputs, the input to the unit itself and the current hidden state. While training it will learn set of weights and decide when to forget or remember these hidden states. When the reset has non zero output, whatever the final output is, it will be labeled as h tilde (~h) from above figure.

This new ~h values will now go up to the path of update gate ‘z’. And this gate has another input which is previous hidden state (h) of the unit. Now in course of training the model, this update gate will decide how much of previous state ‘h’ and new state ‘~h’ should this unit have. If the update gate gives 1 then it will take more from ‘h’ and if its 0 then it will take more from ‘~h’.

This whole process can be implemented in Keras by using a single line:

Simply, replace ‘SimpleRNN’ by ‘GRU’ and GRU will be implemented.

But to look how the gate functions, lets look at it by implementing on Theano

It simply multiplies the inputs with weights and adds a bias term, which is what a neural network does.

Overall function of a GRU in Theano can be written as

We are at the end of Part 1 course. Here, Jeremy mostly talked about Convolutional type neural networks because that was what very popular at that time of recording the course and beginner friendly as well to quickly teach and make students understand the working behind neural networks.

I hope you enjoyed this TL;DR version. I highly encourage you to take the entire course as it is one of the best course available today for beginners who are interested in deep learning.

From these lessons now you have some idea on what this course teaches and I wish you are excited to learn more about it.

Lesson 7 notes can be found here
Watch the full video from here
Link to Lesson 7 jupyter notebook
Link to Lesson 7 WIKI page

In this publication, I will be posting more lessons on deep learning in coming future. Including concept explanations, code implementations in different frameworks and languages and I will share my experiences while studying Deep Learning.

See you around.

Faster AI: Lesson 7 — TL;DR version of Fast.ai Part 1

1. Resnet

2. Multi Output & Heat Map

Heat Map

3. Gated Recurrent Unit (GRU)

Written by Kshitiz Rimal