EmoTorch

Yashika Sharma
5 min readMar 16, 2020

--

Sample from FER2013 Dataset with labels

EmoTorch is a project built as a part of the Facebook AI Hackathon 2020 using PyTorch. The project aims at predicting the emotion of a person based on the image of their face. The image can be anything ranging from a selfie to an image captured while scrolling the feed via the mobile’s front camera or webcam.

Because of the PyTorch’s diverse modules and packages, that is the main tool used in this project.

This image is then fed to the neural network which extracts the features from the image, analyzes the emotions and predicts the most accurate emotion out of 7 most common emotions.

Most Likely predicted classes

This article is the explanation of the model, the motivation behind the idea and future scope will be in next article.

The EmoTorch repository is explanatory to a great extent but we will overview the project from the top here.

Choosing the Dataset

https://datarepository.wolframcloud.com/resources/FER-2013

The first step in any project is to choose a dataset, we chose the publicly available FER dataset for our task. The reason behind choosing this dataset :

  • It has images categorized in one of the seven emotions.
  • Is publicly available
  • The length of the dataset is suitable for our task with
  • Training set: 28,709 examples.
  • Test set: 3,589 examples. -Validation set: 3,589 examples

The data was pulled from a past Kaggle Competition.

There are two files available. First file train.csv contains two columns, “emotion” and “pixels”. The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The “pixels” column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row-major order. test.csv contains only the “pixels” column and our task is to predict the emotion column.

The emotions available are as follows

{'0': 'angry',
'1': 'disgust',
'2': 'fear',
'3': 'happy',
'4': 'neutral',
'5': 'sad',
'6': 'surprise'}

However, the model is built in a way to work with any dataset. We can use commercial datasets with our model and get better results.

After choosing the dataset, we preprocessed the images. The images were already centered crop with good to go dimensions. We did not choose to resize and went with just normalizing the images and converting them to tensors.

Data Augmentation

Network Architecture

The motivation of using transfer learning for our task came after we implemented a Deep Neural Network from Scratch. The model built from scratch gives accuracy only around 18%-20%. We boosted the accuracy with the help of transfer learning.

PyTorch’s subpackage model has a variety of pre-trained networks that can be easily downloaded.

For EmoTorch, we tried multiple networks before settling to VGG19.

Initially, we used VGG16 which gave us accuracy below 40% followed by ResNet50 with 41% and DenseNet101 with 42.5%.

VGG19 yields an accuracy of 46% which is better than all other pre-trained models.

Therefore, we decided to choose VGG19 for the implementation

VGG19 Architecture

Model & HyperParamters

The pre-trained models are trained using the ImageNet dataset, which has 1000 classes. Our task is to only classify the images into one of the 7 emotions so we had to alter the classification layer. We prepared our own Network to merge with the vgg19 pre-trained layers.

For this task we chose-

  • 1024 dense hidden layer
  • ReLu Activation Function
  • Dropout layers in between the hidden layers with p=0.2
  • Adam Optimizer
  • 25 Epochs
  • Batch Size of 64

The Imbalanced Dataset

The distribution of samples per category in the FER dataset is not balanced. The category disgust is least represented with only 547 samples whereas the category happiness is most represented with 8989 samples.

Future Scope lies in augmentation. Multiple balancing techniques can be used to present an equal number of apparations per category which will result in higher accuracy

Data Distribution

Accuracy

The model gives an accuracy of 46% which is due to the dataset we used. The commercial large dataset with high-resolution pictures can outperform and give better accuracy.

Some of the plots we plotted with TensorBoard :

Training Loss
Validation Loss
Valid Accuracy

Few Examples from Testing Set

Top 3 predicted classes
All the class probability

What’s Next?

EmoTorch can be combined with recommendation systems. The image when passed to our model will return the predicted emotion. This emotion can be used by the system to recommend the products.

Often we see recommendation systems working based on user’s watch history or buying history. EmoTorch gives real-time predictions that will help in more accurate recommendations. User can either feed their selfie to the system or front camera can track the facial expressions on the user’s consent. In any case, the image will be then processed by EmoTorch and prediction will be used by the system to recommend songs to listen, movies to watch, products to buy, places to visit and much more.

Contributors to EmoTorch are-

  • Yashika Sharma
  • Nathan Curtis
  • Ahmed Hamido

Visit the project repository.

--

--

Yashika Sharma

Data Engineer @Nortal I write mostly about tech but sometimes life and the lessons learned along. Feel free to connect on https://linkedin.com/in/yashika51 :)