We’ve all been there. You’ve got a collection of tweets from an unknown author, and you’re wondering — “Are these Tweets written by Donald Trump?”
Our forebears would have just shrugged their shoulders and given up, surrendering the authorship of these tweets as an ineffable mystery. Luckily for you and I though, we exist in the same day and age as Keras — a deep learning API so simple that even one such as yours truly can start to make use of it.
In this post I will investigate: Can a dense neural network determine if Donald Trump authored a tweet?
An important caveat should be applied to the phrase “Dense neural network”. My experience with machine learning, deep learning, and neural networks is mostly just skimming the Keras docs while working on this project and googling error messages. I had to google the phrase “Dense neural network” to see if it applied to what I’m doing while writing this post, and maybe it does. Point is: This is a description of my investigation, not a machine learning tutorial.
This technical schematic explains my understanding of machine learning and neural networks together.
Basically, arrays of numbers go in, and arrays of numbers come out. When the right arrays of numbers come out, the magic box teaches itself to keep doing more of whatever it just did, and when the wrong numbers come out it learns to do less. The powerful part about this is you get to decide what the arrays of numbers mean and whether or not they are right or wrong.
Lets convert my problem, “Are these tweets written by the President?” into a form that can be satisfied by the magic box. i.e. I want my tweets to be arrays of numbers and my answer (Yes or No) to be an array of numbers. The answer is easy, I want a 1 dimensional array of 1 element, where 0 means “No” and 1 means “Yes”. The tweets are a bit harder.
Stylometry is the process of identifying the author of a text based on characteristics of the text. You can think of it like fingerprinting, but instead of looking at identifying features in friction ridges and whorls we find identifying features in characteristics of text.
What I want to do here is a kind of stylometry leveraging a dense neural network. I’m using the phrase “a kind of stylometry”…