You can now analyze the nominees’ discourse using Machine Learning

Rohan Kshirsagar
2 min readNov 8, 2016

--

Who’s saying what?

In a previous post I wrote about training a classifier to detect the nominee given the utterance.

After training on 52,000 words of data, generated by this corpus, I have made this classifier available for your own analysis. You can see how likely it is for a candidate to say any given sentence.

This comes with a few caveats:

  • Since it’s trained on past speeches, it won’t know anything about the latest scandal or event
  • the features are mostly Bag of words which means that changing the sequence of the sentence will not change the result
  • As a corollary, adding negation to a sentence may not change the result. The difference between “We’re going to build a wall” and “We’re not going to build a wall” won’t be caught by this model
  • Many policies of Trump are spoken about by Hillary and vice versa. This means that Hillary might be talking about the wall just as much. Typing in the word “wall” gives a bit of a probability towards Trump, but not by much

Some fun observations:

  • The more times “very” appears in the sentence, the more likely it’s Trump
  • The model is trained off speech transcripts, not the candidates’ websites. However, testing the policies denoted in each of the candidate’s websites against the model always predict Hillary — for both candidates policies…Trump doesn’t seem to speak about [most of] his policies during his speeches.

Cheers! The election is almost over…

If you liked this, click the💚 so other people will see this here on Medium.

--

--