Can Tone of Voice Predict Political Attitude?

Predictive models can detect political ideology using data from Supreme Court oral arguments

Many recent social science projects have sought to estimate the political affiliation of individuals based on textual, emotional, and non-verbal expression analysis. Based on the success of many of these studies, Yassine Kadiri, CDS Student and DeepMind Fellow, Zsolt Pajor-Gyulai and Thomas Leble, NYU Courant, Elliott Ash, ETH Zurich, and Daniel L. Chen, Toulouse School of Economics, decided to explore whether basic phonetics data from public speakers could predict their political ideology. The researchers acknowledge the definition of ideology is subject to debate, but they frame the prediction problem as a classification task to determine if speakers are democrat or republican.

Kadiri and collaborators take a novel approach in their new paper by using TextGrid data, a type of data which links speech and pronunciation to provide phonetics data for each word. The researchers used two primary datasets: the Oyez dataset which contains recordings of Supreme Court arguments between 1998 and 2013, and Stanford’s DIME (Database on Ideology, Money in Politics, and Elections) dataset. The Oyez dataset provided the phonetics data for public speakers (attorneys and judges) and the DIME dataset was used to label the ideology of each speaker.

This approach assumes that for one word, the same vowel can be pronounced with different emphasis. Consequently, the resulting TextGrid files “are distinct files for each word and vowel in that word.” Since their final datasets included at most 450 instances of vowel pronunciations for each word from distinct speakers, the researchers chose to use regularized linear models. The researchers trained the models to predict the ideology of a new speaker based on the pronunciation of selected ideologically charged words.

Initial performance of the algorithm based on phonetics data was poor, but when the researchers removed a count value (a measure of how many times each speaker spoke each word) and decided to work on all the words available in their dataset, performance improved significantly. Ridge, Logit, and Lasso classifiers applied to the data reached more than 70% accuracy, but the researchers explain that such good scores, while encouraging, are affected by the polarization of Supreme Court arguments.

However, some words performed so well that their scores most likely arise from the detection of an accent or speech pattern correlated with a particular ideology. The researchers now aim to enhance their predictions based on certain words that members of one political party would use more than another, but they remind readers that their focus is on vowel-level pronunciation rather than word usage.

For future exploration, Kadiri and collaborators propose measuring whether their classifiers rely on detecting accents or variations in pitch rather than vowel-level pronunciation alone. They also consider using Natural Language Processing models on the speeches and combining that with their audio model.

By Paul Oliver