Weiwei Gu, Bill Chen, Justin Kahr, Said Mrad
For our CSCI379 final project, our group worked on analyzing human’s emotion through their facial expressions. For centuries, the human race has studied emotions to understand it at its core and to distinguish what others are feeling. Therefore, being able to recognize emotion in humans and understand the important role this plays not only for interpersonal relationships, but as well for the interaction between human and machine communication. As an active research topic, we’ve been developing automatic ways of recognizing emotions through new models. This has allowed us to make several key advances in this field.  We explore the most known and fundamental concepts in this paper through the models of Ekman and Lang. Additionally, we subsequently used Russell’s model in order to implement our own designs for emotional recognition using facial expression.
Emotions have multiple ways of being recognized but there are two models that stand out as being the most known ones. The “discrete emotion model” proposed by Ekman and the “emotion dimensional model” proposed by Lang.
The discrete emotional model categorizes emotions into six basic emotion states which are surprise, anger, disgust, happiness, sadness and fear. Those six emotions were widely accepted as being “universally and biologically experienced by all humans and widely accepted as such in the community” . It is believed that all emotions are synonyms of these core emotions (different models propose more core basic emotions (seven to 10 core emotions on average) .
The emotion dimensional model proposed by Lang which assumes that emotions are a combination of several psychological dimensions. This is developed well with the most known dimensional model “the valence-arousal dimensional model” which was proposed by Russell which assumes that all emotions can be characterized by valence and arousal. Valence being the negative or positiveness of an action so pleasure and displeasure in the figure and arousal is the amount of excitement put into it characterized by Activation and Deactivation in the figure.  These are the two most well known models for emotion recognition through facial characteristics and are the basis for future research in that topic. People have explored the correlation between both to be able to understand emotions even better as this is the first step to being able to perceive them in people.
Facial recognition is a complicated problem in its own right and is only slightly related to emotion recognition, so we decided to use an existing facial recognition system and focus our efforts of the emotion side. We decided to use Microsoft Azure’s Face API for this task. Our program uploads images to the Microsoft server, where the Azure software performs facial recognition. Our program is returned json files which contain coordinate information for a rectangle containing the face and detected facial features.
Next, we need to be able to effectively correlate the information in the json files to an emotional state. We decided to use a neural net for this part of the project. A neural net seems to naturally fit this problem, as we need need to take a state consisting of a constant number of inputs, which could have a huge combination of value, and associate it with an output. Currently, we are using a model with 40 input nodes, 4 hidden layers each with 160 nodes, and 4 output nodes. Instead of feeding the network with the coordinates of the picture, we fed the ratio of the position compared to either the width or the height of the face. The outputs correspond to the 4 emotional states presented in our database. To train our agent, we used the known emotional state of the image as the desired output. The neural net is made with Keras library, which is easy to build and can be saved for further use.
In order to run and evaluate our solution, we needed a suitable set of images containing faces, preferably already classified by emotion. We decided to use the Chicago Face Database, a free set of photographs of faces and norming data developed by psychology researchers at the University of Chicago. The database contains several photographs showing different emotions for 158 subjects in the categories: black female, black male, white female, and white male. Each has a neutral, open mouth happy, closed mouth happy, angry, and fearful image. In a few cases, the creators decided an image was not usable and it is not included in the set. We set aside one subject from each category for testing and the rest are used to train our agent.
We ran our trained agent on several faces from different races and genders. The agent performs the best in the happy faces, but sometimes can not distinguish between neutral faces and angry faces. The result might have been caused by the way we trained the agent. Since we are feeding the relative coordination of each recognized spot on the face, it can be possible that the agent is not able to capture the slight difference between neutral faces and angry faces. In reality, it can be really hard to tell whether people are angry from their faces unless they want you to know. Most of the pictures of angry faces do not have significant difference with neutral faces. Also, because we convert our input images to json files, some important information that indicates the difference between neutral faces and angry faces might be lost in this transference, such as the cheek, and the angle of eyebrow. Therefore, the prediction of our agents is of a good accuracy.
To improve the performance of our agents, we may need to also categorize the input by race and gender because the difference in facial features can also affect the accuracy of the prediction. We may also consider to add Asian and Latino to make our agent less biased in the prediction. Also, if possible, we may also need to train with images other than CFD because the CFD is a clean database, and our agent will not be as good when predicting on pictures with less clarity and different head positions.
Utopia And Dystopia:
This technology is widely implemented in our lives. One common feature is our phones’ emotion detection. It sometimes gives you the prompt after you type certain words or like HappyNet which gives a real-time emotion analysis and put the accordingly emoji on top of the human face area. It is also very entertaining to see that a simple emoji replaces your face and expresses your emotion completely without much mistake. We can use these implementations to improve the mental care or advertising but we also need to pay extra attention to the downside of technology.
With the rapid spread of the Internet, everyone gets to share their opinions online without considering their consequences. Cyber bullying is getting more common as everyone can access the Internet via their mobile phones at anytime and anyplace. There are multiple reports in South Korea showing that some singer stars commited suicide after people cyber bullies them. Often, sarcastic messages are the ones that spread more virulent. With the help of emotion recognition, online policing is easier and faster to filter out the neutral news and mainly focus on the sad or angry photos or news on social media such as twitter or instagram. It is a privilege of the Internet that people can post any kind of pictures and feelings and they can be viewed by anyone willingly or unwillingly. To prevent damage of the expressions of hostility and hate speech, it is urgent and helpful to have the online policing policies to protect people who are vulnerable in such situations.
Mental care for the disabled:
Emotion is an essential part of human behaviour, communication and decision making. There’s no doubt that those successful figures have relatively high emotional intelligence in their social skills. However, not everyone’s emotions are easy to understand. People who have autism spectrum disorder (ASD) often struggle to identify the feelings of others and thus result in many difficulties in their daily life. With emotion recognition, AI can act as an ‘emotional hearing aid’, providing help with people who have ASD to go through tough social situations such as classes or offices. This technology can be used to create intelligent robots and help children with ASD to gain a better understanding of emotions.
Unlike other areas of medicine, mental health relies heavily on qualitative data. Since there’s no X-ray machine to scan their emotion, the emotion recognition can function as the feedback to look into the patient’s mind. This can potentially solve the inefficiencies in therapy which is often lack of in time evaluation of treatment response. With real-time data, the therapy can take into those quantifiable metrics and produce better outcomes.
Concern of user privacy:
In facial recognition part, user’s input is required to produce outcome. In our case, a user’s photo is needed to be able to analyze the emotion. With proper use of data, user’s information should not be stored privately on any location. However, that is not always the case. If an emotion recognition company secretly stored the user data and it gets leaked due to major security breach, uses’ information gets shared all over the Internet. This can cause very serious damage to the person such as blackmailing or death threats. So if data is collected, user should be informed in the first hand and fully aware of the downside of signing the agreement.
Biases Based on Emotion:
In Netflix’s popular show BlackMirror, there’s an episode about everyone’s labeled with a score and the score is also affected by their emotions. Everyone is living in a ‘happy and polite world’ and wears a big smile on their faces. Of course, we wouldn’t want this policy to be actually implemented anywhere. However, it is also possible that people will judge according to their facial expression. One common example is the shopping mall. If the clerk sees your reaction towards certain product, he/she might be less welcoming and decides not to provide any service. None wants to be ignored or treated specially at any situation. So it is important to not use these kind of technique with potential bias risk on service industries.
On the other hand, emotion recognition can be put to good use because it is an efficient way to know customers’ experience without putting a lot of human resources to go over every reply. It can generate customer’s attitude towards certain product as a survey or evaluation.
In our project, we only use Chicago Face Database to train our neural network agent. Since CFD is collected by the University of Chicago following proper procedures and we strictly follow the privacy rule, this is not a concern for our project. However, as users for similar product should be careful and raise more attention when filling out the agreement.
 Rago R., 2014, Vision and Emotion, Publisher: tufts, https://sites.tufts.edu/emotiononthebrain/2014/10/24/vision-and-emotion/
 Barrett L.F. , 2006, Are Emotions Natural Kinds?, Publisher: sage pub https://journals.sagepub.com/doi/10.1111/j.1745-6916.2006.00003.x
 Sightcorp, What is Emotion Recognition? https://sightcorp.com/knowledge-base/emotion-recognition/
 Al Machot F.,Elmawchot A., Ali M. , Al Machot E. and Kyamakya K. , 2019
A Deep-Learning Model for Subject-Independent Human Emotion Recognition Using Electrodermal Activity Sensors, publisher: mdpi https://www.mdpi.com/1424-8220/19/7/1659/htm#B3-sensors-19-01659
 Russell J.A., 2003, Core Affect and the Psychological Construction of Emotion, publisher by: Boston College http://ivizlab.sfu.ca/arya/Papers/Others/Emotions/Core_Affect.pdf
 Duncan D., 2016 HappyNet Demo — Deep Learning with Convolutional Neural Nets publisher: youtube https://www.youtube.com/watch?v=MDHtzOdnSgA&feature=youtu.be
Yi H. Y, Cha S., 2019, Cyber bullying, star suicides: The dark side of South Korea’s K-pop world, Publisher: Reuter https://www.reuters.com/article/us-southkorea-kpop/cyber-bullying-star-suicides-the-dark-side-of-south-koreas-k-pop-world-idUSKBN1Y20U4