Facial Expression Recognition with PyTorch, with 4 differently approached models, Part-II
Note: This is in continuation of the previous part (i.e this)
Convolutional Neural Network: CNN
“In mathematics convolution is an operation done on two functions that produce a third function expressing how the shape of one is modified by the other.”
A Convolutional Neural Network is a feed-forward network but is trained through back-propagation. Backward propagation is a method to train neural networks by “backpropagating” the error from the output layer to the input layer (including hidden layers). We start with a KERNEL i.e a simple matrix of weights that slides over our input data performing elementwise multiplication, to sum up the results into a single pixel. In my notebook I have used GrayScale images, Otherwise, in Multi-Channel Images, a different channel is applied to each channel. The outputs are then added pixel-wise.
For more information and a deeper understanding of Convolution NN you can refer to Aakash N S’s notebook, which has covered the implementation of Convolution NN on Cifat10 Dataset and is very elaborative. To understand how the layers work in a CNN you must refer to this amazing article.
Building Cnn Model:
Evaluation and fit function:
Training of CNN model:
- Model HyperParameters
I have used a different optimizer function here. You can have a good look on choosing the right type of Optimizer. Article.
Final Accuracy after training comes out to be 54 which is a good improvement. Lets Plot the Accuracies and Losses using the functions mentioned below:
We see that Training Loss and Validation Loss is not far from one another. Hence we can say that our model is not so Over Fit, but because they are not so close enough either, there is a slight chance of overFitting here. To Remove overFitting we add Transformations to our data as I have said before. Below you can understand overfitting in detail and few ways to remove it.
It's important that the gap between Validation Loss and Training Loss be as minimum as possible.
The result on Test DataLoader:
We got a 54% accuracy which is a very effective improvement from our previous model, unless its overfitted. An overfit Model’s graph could look something like this:
Note: Please note that I Forgot to Remove some cells from the notebook link of this CNN model. You can ignore them.
Prediction from Test DataSet:
This model gave all the three Prediction trials Correctly. Let's look at our last and Final Model.
CNN with Residual Network: (Cnn-ResNet)
Traditionally, in a neural network irrespective of how deep it is, the next layer feeds into the next layer. In a neural network with Residual blocks, each layer feeds into the next layer and also, directly into the layers about 2–3 hops away. One clear image of what I have written is shown below which I have picked up from this article, you can go through the article to know how adding Residual Blocks create a difference in our model.
There are more changes that I have added in this model that I will explain with them coming up.
Defining the ResNet Model:
(Using the same accuracy function we used in previous cases.)
This time, I have not used a fixed learning rate instead, I have used a Learning Rate Scheduler. Learning Rates can be one of the most important hyperparameters for creating our Deep Learning Model. Adapting the learning rate for our stochastic gradient descent optimization procedure can increase performance and reduce training time. Sometimes this is called learning rate annealing or adaptive learning rates.
Another addition in this model is adding weight decay which is a Regularization technique. Regularization is a technique that makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well. Let's have a look at our evaluation and fit function for this Model.
Result before Training:
The Graph tells us that our model is not OverTrained.
Please Note: I have not added transformations in this notebook of mine but trained for a lesser number of epochs. You are most welcome to uncomment the transformations and increase the number of epochs. This will not let your data be overfitted and have better accuracy than what I came up with. Give it a try and let me know!
- Plotting our Learning Rates:
65%… Not so good but not so bad either. I have mentioned above how you could increase it.
That’s all I had. I hope this is of good use to you. You can find me on LinkedIn and reach out to me there.