EmoTorch
EmoTorch is a project built as a part of the Facebook AI Hackathon 2020 using PyTorch. The project aims at predicting the emotion of a person based on the image of their face. The image can be anything ranging from a selfie to an image captured while scrolling the feed via the mobile’s front camera or webcam.
Because of the PyTorch’s diverse modules and packages, that is the main tool used in this project.
This image is then fed to the neural network which extracts the features from the image, analyzes the emotions and predicts the most accurate emotion out of 7 most common emotions.
This article is the explanation of the model, the motivation behind the idea and future scope will be in next article.
The EmoTorch repository is explanatory to a great extent but we will overview the project from the top here.
Choosing the Dataset
The first step in any project is to choose a dataset, we chose the publicly available FER dataset for our task. The reason behind choosing this dataset :
- It has images categorized in one of the seven emotions.
- Is publicly available
- The length of the dataset is suitable for our task with
- Training set: 28,709 examples.
- Test set: 3,589 examples. -Validation set: 3,589 examples
The data was pulled from a past Kaggle Competition.
There are two files available. First file train.csv contains two columns, “emotion” and “pixels”. The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The “pixels” column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row-major order. test.csv contains only the “pixels” column and our task is to predict the emotion column.
The emotions available are as follows
{'0': 'angry',
'1': 'disgust',
'2': 'fear',
'3': 'happy',
'4': 'neutral',
'5': 'sad',
'6': 'surprise'}
However, the model is built in a way to work with any dataset. We can use commercial datasets with our model and get better results.
After choosing the dataset, we preprocessed the images. The images were already centered crop with good to go dimensions. We did not choose to resize and went with just normalizing the images and converting them to tensors.
Network Architecture
The motivation of using transfer learning for our task came after we implemented a Deep Neural Network from Scratch. The model built from scratch gives accuracy only around 18%-20%. We boosted the accuracy with the help of transfer learning.
PyTorch’s subpackage model has a variety of pre-trained networks that can be easily downloaded.
For EmoTorch, we tried multiple networks before settling to VGG19.
Initially, we used VGG16 which gave us accuracy below 40% followed by ResNet50 with 41% and DenseNet101 with 42.5%.
VGG19 yields an accuracy of 46% which is better than all other pre-trained models.
Therefore, we decided to choose VGG19 for the implementation
Model & HyperParamters
The pre-trained models are trained using the ImageNet dataset, which has 1000 classes. Our task is to only classify the images into one of the 7 emotions so we had to alter the classification layer. We prepared our own Network to merge with the vgg19 pre-trained layers.
For this task we chose-
- 1024 dense hidden layer
- ReLu Activation Function
- Dropout layers in between the hidden layers with p=0.2
- Adam Optimizer
- 25 Epochs
- Batch Size of 64
The Imbalanced Dataset
The distribution of samples per category in the FER dataset is not balanced. The category disgust is least represented with only 547 samples whereas the category happiness is most represented with 8989 samples.
Future Scope lies in augmentation. Multiple balancing techniques can be used to present an equal number of apparations per category which will result in higher accuracy
Accuracy
The model gives an accuracy of 46% which is due to the dataset we used. The commercial large dataset with high-resolution pictures can outperform and give better accuracy.
Some of the plots we plotted with TensorBoard :
Few Examples from Testing Set
What’s Next?
EmoTorch can be combined with recommendation systems. The image when passed to our model will return the predicted emotion. This emotion can be used by the system to recommend the products.
Often we see recommendation systems working based on user’s watch history or buying history. EmoTorch gives real-time predictions that will help in more accurate recommendations. User can either feed their selfie to the system or front camera can track the facial expressions on the user’s consent. In any case, the image will be then processed by EmoTorch and prediction will be used by the system to recommend songs to listen, movies to watch, products to buy, places to visit and much more.
Contributors to EmoTorch are-
- Yashika Sharma
- Nathan Curtis
- Ahmed Hamido
Visit the project repository.