I trained a machine to distinguish between Trump and Clinton — heres what it learned.

Let’s play a game of guess who. I’ll give you a sentence spoken by a presidential nominee and you tell me who spoke it.

We’re going to be the smart people.
We’re going to start winning.
I love you all.
I love the poorly educated.
We need more than a plan for the biggest banks.
I believe it is or I wouldn’t be standing here.
I believe we can do all these things because I’ve seen it happen.
The middle class needs more growth and more fairness.

Are those your final answers?


Answers: The first four were uttered by Trump and the second four were Clinton. Take a moment to jot down your accuracy.

I wouldn’t be surprised if most of your predictions were correct, especially if you’ve followed the elections. Humans have good intuition about who’d say what, with enough exposure to a person. With the rapid advance of Artificial Intelligence, Natural Language Processing, and Machine Learning technologies, it’s possible for computers to do the same thing, perhaps better than the human. Why do this?

When analyzing an election, most people tend to focus on funding, political ties, issues, and polls. Few study a candidate’s rhetoric, diction, and style, even though it can be instrumental towards capturing a massive audience through debates, interviews, and rallies. Short simple sentences can bulldoze right through an esoteric discussion on policy. Using words like ‘child’ and ‘community’ can stir pathos from voters with families. The way a candidate communicates is a foundational building block of any election and deserves to be acknowledged and studied as so.

Past attempts

There are a few articles that have attempted to bring this to the media’s attention in the past. The New York Times did a piece on Trump’s diction titled “95,000 Words, Many of Them Ominous, From Donald Trump’s Tongue”. They highlight divisive patterns such as “We vs. them” and combative, personal words against his opponents such as “horrible” and “stupid” citing the number of times these words appear. One obvious problem with the analysis is the story that this article is trying to tell. There’s bias in painting Trump as a villain and finding examples to support such a claim. This can be explained by poking holes in their analysis. As a counter factual, imagine declaring he’s a hippy and finding supporting evidence from a 95,000 word corpus. Another example — how do we know that the words “horrible” and “stupid” are statistically significant in his text and how do we know it’s not just the “We” in “We” vs “them” that contributes more powerfully to his political movement? Another other problem with this analysis is the lack of joint evaluation between Trump and someone else. His use of any word or device can only really be evaluated when compared to another candidate. Would it surprise you that Clinton uses “I’ve” more frequently than Trump? Because it’s true [in the small dataset I’ve procured]

Business Insider analyzed Trump as a salesmen using a 220 word corpus and drawing conclusions from that. Yikes — 220 words. They made some interesting points about the syllable count and packing a punch at the end of the sentence. However true this might be, deeply analyzing one answer on a Jimmy Kimmel late night show is not sufficient analysis. On top of this, this suffers from the same problems as the New York Times article.

The key problems in the previous attempts included bias, lack of joint evaluation, and lack of statistical methods.

Let’s do it right

By using a machine learning approach, we can correct for some of the flaws of past work. There are established practices in this field that aim to eliminate bias by balancing data sets, randomly shuffling our data, and using regularization. By training a model that distinguishes between candidates, we solve our joint evaluation problem. I decided on training a model that can predict whether a sentence is uttered by either Trump or Clinton. I decided on these two because they were the two leading candidates at the time, but nothing prevents this approach from being extended to multiple candidates. One assumption here that doesn’t necessarily hold true is that each sentence is independent from each other. Context from previous sentences will likely inform the previous sentence. However, this drastically simplifies our approach to training a classifier which has the following definition:

Given a sentence: predict whether the speaker is Trump or Clinton

Being able to distinguish between the two speakers can then lead to a statistical understanding of what makes them unique in some way. Assuming we can obtain pretty good accuracy (baseline is 50% — basically a coin flip), then we can look inside the model at the weights to see what it learned. From that point on, we can start making observations and possibly draw conclusions from it.

I gathered speech transcripts through a bunch of Googling and copied them into text files to build my corpus. From that point, it was just using python to clean up the text, extract features, and train logistic regression models. Everything’s available here on my Github.

Successful Models and their features

Based on splitting the sentences up into a training and testing set, I evaluated each trained model by gauging it’s accuracy on predicting the correct speaker on the test set. If the accuracy was above 80 %, I’d look inwardly into model weights to see the most influential features. Here’s a couple of them, trained on variations of N-gram features and tokenization strategies.

Most Indicative Words for Hillary

As your President, I’ll do whatever it takes to keep Americans safe.
As we have since our founding, Americans made a new beginning.
And today’s families face new and unique pressures.
We’ve got to work for that!

Clinton’s most characteristic words are a balanced mixture of her political plans (work, new, growth, economy) and the constituency of the people (Americans, families, child). In a way, they seem what you’d expect of a presidential candidate.

Most Indicative Words for Trump

And whether we go — honestly whether we go to Dallas or whether we go anywhere you say  you go to L.A.  you go anywhere you say.
We’re going to bring in so much money and so much everything.
We’re beating the governor.
I love the poorly educated.

This is interesting. The words that are characteristic of Trump are notably not topical. Rather, they’re pronouns, verbs, syntax. This shows that his style of speaking really comes through, in signature fashion. His use of “We’re” implicates a strict application of Pathos. It’s possible this causes his audience to feel more involved.

One of Trump’s most indicative words was the dash ‘ —’ . When I searched the corpus for examples of this, nearly ever sentence was a series of fragmented clauses. Trump tends to clarify, change direction, and almost speak in a stream-of-consciousness style. Here’s a couple more examples of it.

Everybody can be c — This plan is just a basic disaster.
Over here the other day we had a 9,000 — we had a 10,000 — we have people.
defeating ISIS and stopping the Islamic terrorists — and you have to do that;

One hypothesis is that this fragmented, stream-of-consciousness style of speaking casually lowers the guard of potential voters. In a way, it may seem more personal compared to the sanitized, well-constructed sentences of other politicians. In one of my experiments where words were split into the base word and contraction, I found that Trump heavily used contractions such as ‘s, n’t, and ‘re compared to Hillary. This further adds to the casualness of Trump’s speech style.

Let’s look at what phrases define the nominee’s discourse, according to the linear model.

Most Indicative Bigrams for Hillary

  • I believe
  • You know,
  • middle class
  • We need
  • break barrier.
  • family issue

Most Indicative Bigrams for Trump

  • We’re going
  • We love
  • I love
  • It’s going
  • We won
  • great again.

The difference between the two candidate’s style of speaking is even more important here. Hillary’s phrases are diversified across rhetorical appeals. You can see glimpses of ethos in phrases like “You know”, “I believe”, logos in “middle class”, “family issue”, and pathos in “We need”, and “break barrier”.

This is jarringly different than Trump’s most characteristic bigrams. It seems to be mostly focused around pathos . Phrases like “We’re going” and “We love” offers a hand to listeners to part of the movement, to be involved, to contribute. How should the characteristic phrase “I love” be categorized? Here’s a few examples of his use:

I love the people of Iowa
I love you folks very much.
I love Mexico, I love China, I love many of these countries that rip us off…

He seems to use it in two cases — 1) to address the people of the audience warmly and favorably and 2) to show compassion towards the countries he speaks as problems to America. He proposes building a wall to prevent Mexicans from illegally entering, but then strives to neutralize the extreme by saying he loves Mexicans. He proposes steep trade sanctions and tariffs between China, our number one import partner, and reminds his audience that he loves China.

Model misfires

Errors of the model can lay bare to its short comings. It can also help balance our own biases. Here’s a few of the mis-predictions on the test-set.

Trump: No matter where we go we fill up the arenas.
Hillary: And I wouldn’t be here if it weren’t for the steady support of so many people.
Trump: Disgusting reporters horrible.
Trump: fixing our country’s infrastructure, our bridges, our schools, our highways, our airports.
Trump: And I will say this, my father is an incredibly hard worker and he’ll be working for each and every one of you.
Trump: We have to make our country rich again so we do that, so we can save Social Security.
Hillary: I’ve said I want to be the small-business president, and I mean it.

Features that we might want to add based on this analysis:

  • Replacing all adjectives with their most common synonym (horrible and terrible become the same word)
  • Word length of the clause
  • Intermixing part of speech tags with words e.g. “our country ADJECTIVE again”
  • Counting repetition in phrase structure

Quid pro quo

We learned details of Trump and Clinton’s rhetoric that were all backed by a statistical model. In fact, the model might be better at predicting the speaker than we are, which is a good measure of reliability.

This analysis tackles some of flaws in judging the rhetoric of candidates, but still falls short in a number of ways. We need more data from a variety of different speeches and interviews. The data gathering step that I conducted only went so far. With more data, we can be more confident in the accuracy of the model and takeaways from the results.

In exchange for any resources and data you give me, I’ll expand this analysis and improve it. I’d love your suggestions on different features so I can train models and report more exciting results to you. If you liked this, please follow me! Interested in more of this? Check out my post on analyzing Hip Hop Lyrics and organizing your thoughts for software engineering. Comments and suggestions highly welcome.

If you liked this, click the💚 so other people will see this here on Medium. Thanks!


Extras

Can we go von Deeper?

Our machine has analyzed the candidate’s words and phrases literally but haven’t stabbed any of the deeper, subtler structure of their discourse. Does Trump tend to end his sentences with a punch? How well does the syllable count predict the speaker? Does Hillary really speak in Iambic Pentameter?

I won’t answer any of these questions [perhaps you can], but we’ll run through the results of a model trained on deeper, more latent features.

Part of Speech Structure

I trained a Logistic Regression model that used the Part of Speech tags as input, instead of the words. Here’s an example of what that would look like.

Original Sentence:

Let’s have a big win in Nevada.

Part of Speech Equivalent:

NN(Noun) POS(Possessive) VBP(Verb) DT(Determiner) JJ(Adjective) NN(Noun) IN(Preposition) NNP(Proper Noun)

By feeding the model an abstract, latent representation of the sentence, we can learn that phrases like “We’re going” and “We’re beating” are similar because they both equate to “PRP(Personal Pronoun) VBP(Verb) VBG(Gerund)”. This is drawing on shared representations between samples which can help the model learn all sorts of useful things.

Here are some of Trump’s and Clinton’s most predictive POS (Part of speech) phrases, learned by the machine, along with one or two examples of it’s use.

Clinton’s most characteristic POS phrases

NN(Noun) NN(Noun)

My first job out of law school was for the Children’s Defense Fund.

NN(Noun) TO(to)

Using additional fees and royalties from fossil fuel extraction to protect the environment.

If necessary, I will support a constitutional amendment to undo the Supreme Court’s decision in Citizens United.

CC(Coordinating Conjunction) JJ(Adjective)

And how fortunate we are, indeed, to live in the most diverse, dynamic and beautiful state in the entire union.

Trump’s most characteristic POS Phrases

PRP(Personal Pronoun) VBP(Verb Present)

They tell me that.

I love you all.

VBP(Verb Present) VBG(Verb Present Participle)

We’re bringing our jobs back, folks.

We’re fighting ISIS, but ISIS wants to overturn the government.

VBG(Verb Present Participle) TO(To)

And that’s going to happen.

My two cents

While the results of the model were on par with those trained on literal words, analyzing features that are based on latent structure becomes very complex and hand-wavy. Rather, the important features fuel me with more questions. How can chunking help our model? What are all the present participle Trump uses?

Basically, I start viewing my model as a hungry puppy, and start thinking of ways to feed it and grow it.