Face Emotion Recognition using PyTorch

Saumya Raj
4 min readAug 9, 2020

--

This medium post is about creating a CNN model using PyTorch for predicting Face Emotion from images.

We will start with explaning little bit about Tensors.A tensor is a number,vector,matrix or any n-dimensional array.The difference between a NumPy array and a tensor is that tensors can be used on GPUs.As we know with the amount of data present today for computation,it’s important to use GPUs for fast computation.A tensor can be of two type:constant and variable.A constant tensor value cannot be changed after being initialized.(It’s immutable).We will generally deal with variable tensors.

So,let’s start with what PyTorch is.PyTorch is a python-based library for processing Tensors.It is desgined to use GPUs too.Good thing about PyTorch is that we can calculate derivatives of tensors automatically if requires_grad is set as True.

This post is divided into six parts:

  1. Importing libraries and datasets
  2. Spliting datset into training/validation/test sets
  3. Using GPU for computation
  4. Model Architecture
  5. Model Training
  6. Model Results

1.Importing libraries and datasets

We begin by importing required libraries.

Importing Required Libraries

We are using fer2013 dataset from kaggle website competition.Here is the link :https://www.kaggle.com/deadskull7/fer2013 .

Below are the code snippets of importing of dataset :

We are creating tensor dataset from dataset using ImageFolder. ImageFolder format is torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=<function default_loader>, is_valid_file=None)

ImageFolder is a generic data loader where the images are arranged in .png format.

2.Spliting datset into training/validation/test sets

The dataset needs to be split into three parts:

  1. Training set — used to train the model i.e. compute the loss and adjust the weights of the model using gradient descent.
  2. Validation set — used to evaluate the model while training, adjust hyperparameters (learning rate etc.) and pick the best version of the model.
  3. Test set — used to compare different models, or different types of modeling approaches, and report the final accuracy of the model.

The dataset contains RGB images of 48X48 pixels for these emotions:angry,disgusted,fearful,happy,neutral,sad,surprised.

It contains 28709 images which we will divided into 23709 training and 5000 validation dataset using random_split.

DataLoader is used to split the data into batches of predefined size while training.It also provides other utilities like shuffling and random sampling of the data.Here batch_size is 128.We use num_worker for parallel processing.It will use background thread to load data.We have used shuffle for training data set so that the dataset are representative of the overall distribution of the data as data are sorted according to their class/targets.

3.Using GPU for computation

Here comes an important part on how to use GPU for fast data processing.

GPUs contain hundreds of cores that are optimized for performing expensive matrix operations on floating point numbers in a short time, which makes them ideal for training deep neural networks with many layers.

We can check if a GPU is available and the required NVIDIA CUDA drivers are installed using torch.cuda.is_available. According to the device avilable we will use GPU or CPU.The model should be compatible for using both GPUs and CPUs.The function to_device is called whenever we need to load tensors to GPU device. DataDeviceLoader fucntion loads model and tensor dataset to the GPUs.

4.MODEL ARCHITECTURE

I am using Residual Network (ResNet) Architecture of Convolutional Neural Network (CNN) .ResNets is helpful when we are dealing with “vanishing gradient” problem.

Vanishing Gradient Problem- When we are dealing with many layers in a deep Neural Network ,while calculating the gradient during backpropagation to minimise loss,with repeated multiplicationthe gradient becomes very small to a point that it disappears which reduces the performance even with many layers.

ResNets is like identity mapping,if a layer don’t do anything initially we skips over them compressing the neural network which enables faster calculation ,and then later while training all layers expand to explore more features.

We are using Batch Normalization here.

5.MODEL TRAINING

Here before training the model,i have used some improvements to train it better:

Learning rate scheduling:It involves changing of learning rate after every batch of training.One kind of scheduling is “One cycle learning rate policy”.It involves starting with a low learning rate and increasing the learning rate batch-by-batch till 30–35% of epochs and then reducing it low values gradually.

Weight Decay: Weight decay is another technique to prevent the weights of Neural Network from becoming too large by adding an additional term to loss function.

Gradient clipping : This limits gradients to become large which can bias the output of the model.

6.Model Results

Below is the accuracy and loss graph w.r.t no.of epochs.

Below is the accuracy and loss calculation from the test dataset:

Here is an example of model predicting Face Emotion:

Thanks for reading :)

References:

https://jovian.ml/saumyaraj52/emotion-recognition-project

--

--