David Bryce Yaden
5 min readApr 1, 2016

Algorithms Read Between the Lines of the Presidential Candidates

Photo via Shutterstock

This election cycle has brought an unprecedented amount of news coverage and social media chatter, in addition to televised debates and campaign rallies. Yet given this flood of information, how does one encapsulate each candidate’s message?

Voters often rely on sound bites and controversial quotes that get picked up in news. These excerpts may or may not be representative of the candidates’ actual speech: the focus is often on perpetuating a coherent media narrative for each candidate rather than on fairly presenting what was said.

Here, computerized linguistic analysis can be helpful.

Analyzing the Debates

Computational language analysis can provide a systematic way of automatically summarizing statements made by the presidential candidates, creating a linguistic snapshot of their political platforms. Computer scientists and psychologists at the University of Pennsylvania are developing algorithms to distill large amounts of text on a scale often referred to as “Big Data”. Here, we’ve used these tools to analyze the candidates’ language to summarize what each candidate focused on during the debates.

Below are the word clouds showing the words and phrases that most distinguish the 2016 presidential candidates from each other, based on their comments during the first 9 republican debates and 6 democratic debates (reported on elsewhere). We’ve also added the 6 “topics” the candidates mentioned the most — clusters of related words our algorithms have identified as appearing together.

Republicans

Trump’s words, phrases and topics center around business concerns like trade and deal-making, including references to his career (eminent domain). We can also see many of the ingredients of his critiques of other people and ideas (e.g., negative adjectives such as nasty, incompetent, stupid, horrible).

Cruz’s language focuses on foreign affairs and international threats (Khamenei), as well as on domestic questions of immigration and taxation. A topic references his comments about the role of the commander in chief.

Kasich’s language is more domestically focused, mentioning the budget, the economy, law enforcement, and refugees. We can also see frequent mention of his experience as governor of Ohio.

Democrats

Sanders’s words, phrases, and topics address the campaign finance system, Wall Street, U.S. foreign policy, the criminal justice system, and income equality for middle and working class Americans. He uses the terms african-american and latino frequently.

Clinton’s language primarily addresses health care as well as some mention of the conflict in Syria. She also mentions children and families, agreement with Senator Sanders and discusses party affiliations.

While the gestalt of these word-clouds can be interpreted differently by different people, they have the virtue of having been generated by a systematic process, free from the biases of media pundits and commentators.

Reading between the lines

Beyond these methods helping us to sift through candidates’ language, they can also help us peek behind the psychological curtain, and help us look for more subtle, specific patterns.

For example, we can compare the use of first person pronouns by candidates:

Pronoun use

Candidates clearly differ from one another in their use of first-person pronouns (e.g, I, me, my, mine), which has been associated with higher self-focus in previous research. Trump and Clinton use more first-person pronouns than the other candidates.

Positive emotion

Clinton and Trump also used the most positive emotion words in the debates. These words (e.g., hope, happy, awesome) are often used in the context of optimism about the future.

On the other hand, here are negative emotions expressed by the candidates:

Negative emotion

For negative emotion words (e.g., hate, sad, pain, lost) it’s Cruz by a clear margin, and Trump to a lesser extent. In terms of expressing negative emotions, Kasich looks more like a Democrat than a Republican.

A Data-Driven Approach to Politics

Our approach is part of a larger trend of bringing data science and linguistic analysis into politics.

The Economist recently pinpointed the tipping point of this trend in 2008, with Nate Silver’s entrance into politics from the world of baseball statistics. When Silver was challenged by Joe Scarborough, an advocate of old-school intuitions, Silver bet him $2,000 that Obama would win based on his predictive algorithm. Since then, political science and political reporting has increasingly let the data take a lead role in telling the story (see prime examples from recent New York Times articles here and here).

Language findings are usually open to a number of different interpretations, but they have the virtue of having been generated by a standardized process. They are not subject to only what the audience remembers and which tidbits of the debates the media has chosen to replay over and over. Linguistic analysis may not tell us all we want to know about the “essence” of each candidate, but it can help detect more subtle patterns — like pronoun use — than we can notice when we listen to debates.

The benefit of having algorithms summarize our political information is that they don’t have a media or political bias (and they don’t have a special interest lobby — at least not yet).

World Well-Being Project, University of Pennsylvania~

Follow @WWBProject

David Bryce Yaden

Psychology PhD student at the University of Pennsylvania. Follow @ExistWell