Facial Expression Recognition with PyTorch, with 4 differently approached models, Part-II

Note: This is in continuation of the previous part (i.e this)

Image for post
Image for post

Convolutional Neural Network: CNN

In mathematics convolution is an operation done on two functions that produce a third function expressing how the shape of one is modified by the other.

A Convolutional Neural Network is a feed-forward network but is trained through back-propagation. Backward propagation is a method to train neural networks by “backpropagating” the error from the output layer to the input layer (including hidden layers). We start with a KERNEL i.e a simple matrix of weights that slides over our input data performing elementwise multiplication, to sum up the results into a single pixel. In my notebook I have used GrayScale images, Otherwise, in Multi-Channel Images, a different channel is applied to each channel. The outputs are then added pixel-wise.

For more information and a deeper understanding of Convolution NN you can refer to Aakash N S’s notebook, which has covered the implementation of Convolution NN on Cifat10 Dataset and is very elaborative. To understand how the layers work in a CNN you must refer to this amazing article.

Building Cnn Model:

Base Class


Structure of the CNN model
This is how our model looks now.

Evaluation and fit function:

Training of CNN model:

  • Model HyperParameters

I have used a different optimizer function here. You can have a good look on choosing the right type of Optimizer. Article.

Running 10 Epochs

Final Accuracy after training comes out to be 54 which is a good improvement. Lets Plot the Accuracies and Losses using the functions mentioned below:

  • Accuracy:
Accuracy function
Graph of improvement of accuracy
  • Loss:
Function for calculating Loss
Loss Graph

We see that Training Loss and Validation Loss is not far from one another. Hence we can say that our model is not so Over Fit, but because they are not so close enough either, there is a slight chance of overFitting here. To Remove overFitting we add Transformations to our data as I have said before. Below you can understand overfitting in detail and few ways to remove it.

It's important that the gap between Validation Loss and Training Loss be as minimum as possible.

The result on Test DataLoader:

Accuracy on Test DataLoader

We got a 54% accuracy which is a very effective improvement from our previous model, unless its overfitted. An overfit Model’s graph could look something like this:

Image for post
Image for post
An Overfit Model

Note: Please note that I Forgot to Remove some cells from the notebook link of this CNN model. You can ignore them.

Prediction from Test DataSet:

Correct Prediction

This model gave all the three Prediction trials Correctly. Let's look at our last and Final Model.

CNN with Residual Network: (Cnn-ResNet)

Traditionally, in a neural network irrespective of how deep it is, the next layer feeds into the next layer. In a neural network with Residual blocks, each layer feeds into the next layer and also, directly into the layers about 2–3 hops away. One clear image of what I have written is shown below which I have picked up from this article, you can go through the article to know how adding Residual Blocks create a difference in our model.

Image for post
Image for post
A simple residual Block

There are more changes that I have added in this model that I will explain with them coming up.

Defining the ResNet Model:

(Using the same accuracy function we used in previous cases.)

Base Class for our model (Same as Previous)
Convolution block with ResNet Model

Model Structure:

Deeper Than all Other Models we Created


This time, I have not used a fixed learning rate instead, I have used a Learning Rate Scheduler. Learning Rates can be one of the most important hyperparameters for creating our Deep Learning Model. Adapting the learning rate for our stochastic gradient descent optimization procedure can increase performance and reduce training time. Sometimes this is called learning rate annealing or adaptive learning rates.

Another addition in this model is adding weight decay which is a Regularization technique. Regularization is a technique that makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well. Let's have a look at our evaluation and fit function for this Model.

Evaluation and Fit Function for this Model

Result before Training:


Detailed Hyperparameters
Running the 10 epochs
  • Accuracy:
Graph os Increase in Accuracy with epochs
  • Loss:
Graph comparing Training and Validation Loss

The Graph tells us that our model is not OverTrained.

Please Note: I have not added transformations in this notebook of mine but trained for a lesser number of epochs. You are most welcome to uncomment the transformations and increase the number of epochs. This will not let your data be overfitted and have better accuracy than what I came up with. Give it a try and let me know!

  • Plotting our Learning Rates:

Final Accuracy:

Final Accuracy on Test Data Loader

65%… Not so good but not so bad either. I have mentioned above how you could increase it.


That’s all I had. I hope this is of good use to you. You can find me on LinkedIn and reach out to me there.

Also, I ought to thank JovialML, AakashNS, and his team, and FreeCodeCamp for providing this Awesome Course: Zero to GANs with PyTorch free of cost. It was a very Insightful journey.

Written by

Here to share some views and gather some insights.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store