Replicating Empathy Through Machine Learning!

Alisa Hathaway
7 min readApr 2, 2020

We all know that currently, computers cannot form opinions or develop ethical perspectives that aren’t present within its programming. While they cannot develop empathy themselves, can they accurately detect variations in a person’s empathy for an individual or a situation? To explore this concept, our team chose to explore how we can leverage machine learning to detect a reader’s empathy for a character based on their actions or speech.

This becomes particularly scary when we think about the future and the integration of robotics in our day to day life. Because of this, our team wants to explore ways that we can make computers determine its feelings towards a character based on their actions or speech.

In this project, our team trained a machine learning model to output predicted reader empathy levels for a character in a short story. These empathy levels were calculated based on an emotional feeling analysis performed on the sentences composing the story. This was a regression task, and our team determined which feelings we wanted our model to include in its features. We decided to base the “empathy” output of our program off of 5 pertinent features, which we will discuss in the following sections. To reiterate, we modeled this program with the intention of predicting the average reader’s empathy towards a character in any sort of text, regardless of format.

To train this dataset, we originally intended to use the ROCStories Corpora dataset, which has over 90,000 five-sentence commonsense stories. However, this idea did not work as well as we had anticipated because the structure of the sentences that could be analyzed with our program did not match the sentences of this dataset. Hence, we decided to create a dataset of stories on our own, and rank the characters based off of how we would personally feel while reading a passage, and their actions. We analyzed every sentence, and modified the value (from 1–10) of how much empathy we felt towards the character at the end of the sentence based on how much empathy the character had before the sentence started. Initially, all characters were ranked at 5 (being neutral) and if our opinions towards the character became undesirable due to the contents of the sentence, we decreased the score, and vice versa. In order to try our best to make the computer program as unbiased as possible, we each reviewed the sentences that our colleagues wrote and modified the empathy score. In addition to the empathy score that we maintained and updated, we kept track of the “character_label”, which distinguished if the character was the subject, the indirect object, or neither. This was necessary to train our machine to learn ‘Who was doing what to Who.’

A snippet of our data set is shown below:

We then used ParallelDot’s text-analysis API, and ran our sentences to receive numeric scores on the six important emotions that we chose above. This provided us with a value between [0,1], displaying how much each of the feeling’s was felt in a sentence. The feeling_1, feeling_2, and feeling_3 were used to determine the top 3 feelings evoked within each sentence. We created numerical labels for the feelings and included it in the dataset (“happy” = 1, “sad” = 2, “excited” = 3, “angry” = 4, “bored” = 5, “fear” = 6).

The next step was to train our data. We decided on two different models, Linear Regression and Adaboost, and selected Adaboost. We tested the model using program_empathy.py that our team wrote and we determined that although our model was not able to predict an accurate value for the empathy when reading a sentence, the relative empathy of the characters in the text was relatively accurate.

We estimated that this downside was due to two reasons: 1. Since we wrote the sentences that we used to train our model, we only had ~200 sentences, so there was not enough data for ideal training, 2. We, a team of three people, made the data, which leads to our biases of how we interpret empathy being prominent in the sentences. To eliminate such bias, it would have been best to have far more people review our sentences and provide empathy ratings. We could have then taken the average of many scores and drawn from a larger database.

Therefore, we decided to shift our project’s desired output from being the numeric empathy score to the relative empathy towards each character (not a value, but a comparison between the characters within a passage).

Another aspect that we noticed was that our team wrote a lot of sentences where fear was one of the most predominant feelings. Hence, when a scary element was present within a sentence, the empathy result of the sentence was more drastically changed than the empathy results if there was a happy element within the sentence. This could be to a bias present within the API or our own bias when creating our training data set. When thinking about the reasoning behind this, we realized that this is actually similar to the reaction of a human — when scarier things occur, we are more likely to remember them and deem them significant as opposed to happier things. (Our program imitates a human well!) :)

This is the analysis of a short story, Little Red Riding Hood! This was one of the more fun and interesting texts that we analyzed using our program, which was slightly on the difficult end of analysis. But the results are still interesting!

The analyzed text is:

“Once upon a time, there was a little girl called Red who lived in a village near the forest. Whenever Red went out, she wore a red riding cloak, so everyone in the village called her Little Riding Hood. One morning, Red asked her mother if she could go to visit her grandmother as it had been awhile since they’d seen each other.Her mother said to Red “That’s a good idea”. So Red and mother packed a nice basket for them to take to her grandmother. When the basket was ready, Red and mother kissed and said goodbye. The mother sweetly cautioned that Red should go straight to the house. The mother sweetly cautioned Red not to talk to strangers because it is dangerous. Red said to her mother that she will be careful! But when Red noticed some lovely flowers in the woods. She forgot to be careful. Red watched the butterflies flit about for a while, listened to the frogs croaking and then picked a few more. Suddenly, the wolf appeared beside her. What are you doing out here, little girl? the wolf asked in a voice as friendly as he could muster to Red. The wolf, a little out of breath from running, arrived at Grandma’s and knocked lightly at the door. The wolf let himself in. Poor grandmother did not have time to say another word. The wolf gobbled her grandmother! The wolf let out a satisfied burp, and then poked through Granny’s wardrobe to find a nightgown that he liked. The wolf added a frilly sleeping cap, and for good measure, dabbed some of Granny’s perfume behind his pointy ears.”

This is the first time the wolf appears. (We don’t know anything about the wolf, but he is a little frightening).
This shows that something bad is happening in this sentence.
Too complicated of a sentence for our program as of now!
A graphical representation of our results, with the y axis being the empathy levels in comparison and the x scale the number of sentences.

In comparison, our program is able to function better on more clear examples that are not too obscure, such as the following:

“Marvin and Kiera are best friends. Marvin talked bad about Kiera behind her back. Marvin laughed when Kiera fell down the stairs. Kiera cried.”

The comparison between empathy levels in the characters is much more apparent, with the same scale as the previous graph.

Overall, our team had a wonderful time with this project! The three of us have never participated in a ML Hackathon before, and we definitely learned a lot. Although the original goal that we had in mind shifted slightly, we were still able to produce some relatively-accurate results, which was pretty interesting and exciting. With more time and fine-tuning, we believe that this project can be a closer step into making a computer that perceives people in the same way that you have judged me and my partners throughout this blog post. Hahaha jk…

Thank you! :)

Project by: Gustavo Santiago-Reyes, Maaya Prasad, and Alisa Hathaway

--

--