291 Followers
·

Facial expression recognition with PyTorch using four differently-approached models

Image for post
Image for post
Dataset covering the ROI-(Region of Interest) of the face to train our model to be able to recognise the expression

Introduction:

This is a beginner-friendly project, with four different approaches to the same problem to show how with every approach, a model becomes more efficient/deeper. I have used the fer2013 data-set to recognise the expression on the image, which you can see in the image shown. You will see how these models are structured differently and how they make a difference in the results.

Here’s the list of models covered, and you can find the links to those notebooks right beside their names:

  • Logistic RegressionModel (nb-A)

DATASET:

For this project, I have used the ‘fer2013’ dataset. You can find it available here. This dataset consists of 35,887 data entries in CSV file format. You can view the data after converting it into a Pandas DataFrame as shown below:

Data frame covering all data in the form of a table.

The data is divided into three categories by USAGE (Training, Validation and Test sets) and into seven categories by EMOTION ie our LABELS (‘angry’, ‘disgust’, ‘fear’, ‘happy’, ‘sad’, ‘surprise’, ‘neutral’). I have created three other data frames as per my requirement, ie, train_df, valid_df, and test_df. The pixels given can be converted into an image by imposing the required transitions. I made a function to do the same. You can find it in the notebook.

Let’s further explore our dataset:

No. of entries under each dataset
Creating separate data frames for each dataset type

Because our dataset is present in the form of a data frame and not as a library in PyTorch, I have created a class ‘expressions’ to take an input data frame and output the image into a Tensor and a label. Now, we have a data type that consists of two variables:

  • Tensor (containing 48x48 grayscale images) and
Class expressions

We will have to import transforms from torchvision to be able to use transformations on our dataset. These transforms are important for image processing. Output images can now be used for analyzing or interpreting further. Even otherwise, PyTorch does not work with images directly, we convert the images into Tensors. TorchVision contains helper classes/utilities to work with image data.

Importing transformations
Chaining Together several transformations

Our expression datasets are of the type:

A tensor containing normalized pixels and a Label attached to it stating the expression.

DataLoaders:

“from torch.utlis.data import DataLoader”

DataLoaders can split the data into batches of a predefined size while training. This is very important if we are dealing with millions of data. We can mention the batch size first, like here I made the batch_size = 400, so a batch of 400 will be loaded into the model at a time.

DataLoaders

Let's have a look at a batch of our data,

A batch of data

I think this looks amazing, especially considering how much time it took for me to get this output right! Phew!

Training on GPUs:

GPUs are a specialized processor unit with dedicated memory, a single-chip processor used for extensive Graphical and mathematical computation hence freeing the CPU. GPUs are required to reduce the training time because, with an increase in data, the training time will increase. In PyTorch, we check the availability of a GPU using torch.cuda.is_available(). To use a GPU, we have to shift our entire model and our data in GPU memory. For this, I have created a function and a class.

Will shift our model and data loader to GPU

Creating the Logistic Regression Model:

Logistic Regression is a statistical and ML technique used to model the probability of a certain class/event, ie, to classify records of a dataset based on the values of the input field. In Logistic Regression, we use one or more independent variables to predict an output with a Boolean output. But it can be used for both Binary and Multiclass Classification.

We have used this as our starting/base model and we will advance toward deeper models. You can find out more about Logistic Regression in this notebook by Aakash NS, here. This notebook has covered the Mnist Data-set.

Model:

Logistic Regression model( )
Moving our model to GPU memory

Training: Our Objective is to change the parameters of the model so as to be the best estimation of the labels of the samples in the dataset. In our training process, we look at the cost function or lost function and see what the relation is between the cost function and the parameters θ, so we should formulate the cost function.

Accuracy is a good evaluation method for Classification but, Its not a good loss function. Here’s why,

Hence we use Cross-Entropy as our Loss Function which is continuous and differentiable which provides us good feedback for incredible improvements.

Evaluate Function

The fit function below will train our model on the basis of the mentioned hyperparameters.

Fit Function

Before Training:

There is a high loss, and low accuracy

While Training:

First 10 epochs
Next 10 epochs

After 25 epochs, our model’s final accuracy comes out to bs 31% approx.

Accuracy after training on test DataLoader

You will be able to see when you run the notebook that the accuracy does not cross this certain limit. I have plotted a graph of the accuracy below:

Graph- (change in accuracy/epochs)

Predicting some Outputs:

I made a predict function for predicting our model's accuracy on the test dataset.

Predict Function
Label vs the predicted Label.

Hey, Look at that!! One prediction is correct. yay!

FeedForward Neural Network:

In a neural network as the name suggests we have an artificial neural network wherein connections between nodes do not form a cycle.

Due to the nonlinearity in these hidden neurons, the output of an artificial neural network is a nonlinear function of the inputs. In a classification context, this means that the decision boundary can be nonlinear as well, making the model more flexible compared to logistic regression. Although higher flexibility may be desirable in general, it carries with it a higher risk for model overfitting (“memorizing the training cases”), which can potentially reduce a model’s accuracy on previously unseen cases. This is where we add data transformations that can create variations in the training data. Example RandomCrop, RandomHorizontalFlip, etc..

You can find a good comparison between logistic regression models and artificial neural networks here under topic 3.

FNN - Model:

Base class
Addition of hidden layers and non-Linear function.

Models Structure:

Structure of our Model

We will shift the model to GPU, use the same fit and evaluation function we mentioned above. Let's see the accuracy before training.

Accuracy before training

Let's begin training:

First 10 epochs
Next 10 epochs

The accuracy wasn't moving ahead which is why I only trained for 25 epochs in total. Let's see how our accuracy and loss changes with training.

  • Accuracy:
Function to Plot accuracy
Accuracy Graph
  • Loss:
Loss Graph

We can see that the loss has effectively decreased with training. Let's check Final Accuracy on Test DataLoader and predict some images from Test Dataset.

Final Accuracy

Predictions:

A correct Prediction from the Test Dataset
An INCORRECT Prediction from Test DataSet

Note:

To not make this too long, I have continued the article HERE. I hope this is of good use to you.

You can find me on LinkedIn and reach out to me there.

Written by

Here to share some views and gather some insights.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store