Say Hello to SAM, the new Speaker Affect Model

At the most recent Text-as-Data Seminar, Knox and Lucas introduce a new model that analyzes emotional tones in conversations

Dean Knox at the CDS Text-as-Data & NLP Research Seminar (2017)

Although we often focus on words when studying political rhetoric, a key component that is often overlooked is sound.

For example, a person’s vocal tone, rhythm, or timbre can have enormous implications on whether we perceive a sentence to be sarcastic or serious.

This is why Dean Knox (Microsoft & Princeton) and Christopher Lucas (Harvard) are working on a new project called the Speaker Affect Model (SAM), which they introduced at the most recent Text-as-Data and NLP Research seminar.

SAM analyzes conversations by breaking them down into utterances, or sentence-length audio recordings, then slicing these utterances into even shorter frames that capture the different sonic dimensions of a person’s speech, such as their pitch or volume, at a particular instant in time.

Analyzing an audio recording at such a granular level, the researchers explained, not only allows us to learn the way that different emotions sound, but also to predict the emotion of an utterance within the context of the conversation around it by using a forward-backward algorithm.

“There are so many ways,” Knox said, “to summarize a conversation.” But what makes SAM unique is that, unlike existing methods, it attempts to model the flow of a conversation.

After building their model, they conducted an exploratory case study on predicting skepticism in Supreme Court audio recordings.

Their preliminary results suggest that while skepticism is often difficult to recognize with text alone, it is possible to predict with audio.

For example, for some Justices, they found that their use of pauses and the word “think” usually express neutrality, while words like “nothing,” “can’t,” and “even” will typically be said with a skeptical tone.

Additionally, there are also Justices who indicate skepticism by vocal tone alone — which is precisely why SAM’s attention to sound is so vital.

“When we hear justices expressing more skepticism toward one side,” Knox added, “we find that they are much more likely to rule against that side.”

While SAM is not a perfect representation of reality (will anything ever be?), this promising model is poised to uncover the secret sonic ingredients that are responsible for effective political rhetoric. Future presidential hopefuls for 2020 — take note!

by Cherrie Kwok

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.