The Best Words

Imitating Donald Trump’s Style Using Recurrent Neural Networks

Leon Zhou
Towards Data Science

--

Photo by History in HD on Unsplash

“I know words. I have the best words.”

Uttered in the heat of a campaign rally in South Carolina on December 30, 2015, this statement was just another of a growing collection of “Trumpisms” by our now-President, Donald J. Trump. These statements both made Donald more beloved by his supporters as their relatable President, while also a cause of ridicule by seemingly everyone else.

Regardless of one’s personal views of the man, it cannot be denied Donald has a way of speaking that is, well, so uniquely him — his smatterings of superlatives and apparent disregard for the constraints of traditional sentence structure are just a few of the things that make his speech instantly recognizable from that of his predecessors or peers.

It was this unique style that interested me, and I set out to try and capture it using machine learning — to generate text that looked and sounded like something Donald Trump might say.

Data Collection and Processing

To learn President Trump’s style, I first had to gather sufficient examples of it. I focused my efforts on two primary sources.

Twitter

One of many examples of unconventional sentence structure.

The obvious first place to look for words by Donald Trump was his Twitter feed. The current president is unique in his use of the platform as a direct and unfiltered connection to the American people. Furthermore, as a figure of interest, his words have naturally been collected and organized for posterity, saving me the hassle of using the ever-changing and restrictive Twitter API. All in all, there were a little under 31,000 Tweets available for my use.

Presidential Remarks and Speeches

In addition to his online persona, however, I also wanted to gain a glimpse into his more formal role as President. For this, I turned to the White House Briefing Statements Archive. With the help of some Python tools, I was able to quickly amass a table of about 420 transcripts of speeches and other remarks by the President. These transcripts covered a variety of events, such as meetings with foreign dignitaries, round tables with Congressional members, and awards presentations.

Unlike with the Tweets, where every word was written or dictated by Trump himself, these transcripts involved other politicians and inquisitive reporters. Separating Donald’s words from those of others seemed to be a daunting task.

Regular expressions are magic. Trust me.

Enter regular expressions — a boring name for a powerful and decidedly not-boring tool.

Regular expressions allow you to specify a pattern to search for; this pattern can contain any number of very specific constraints, wildcards, or other restrictions to return exactly what you want, and no more.

With some trial and error, I was able to generate a complex regular expression to only return words the President spoke, leaving and discarding any other words or annotations.

To Clean, or Not to Clean?

Typically, one of the first steps in working with text is to normalize it. The extent and complexity of this normalization varies according to one’s needs, ranging from simply removing punctuation or capital letters, to reducing all variants of a word to a base root. An example of this workflow can be seen here.

For me, however, the specific idiosyncrasies and patterns that would be lost in normalization were exactly what I needed to preserve. So, in hopes of making my generated text just that much more believable and authentic, I elected to bypass most of the standard normalization workflow.

Text Generation

Markov Chains

Before diving into a deep learning model, I was curious to explore another frequently used text generation method, the Markov chain. Markov chains have been the go-to for joke text generation for a long time — a quick search will reveal ones for Star Trek, past presidents, the Simpsons, and many others.

The quick and dirty of the Markov chain is that it only cares about the current word in determining what should come next. This algorithm looks at every single time a specific word appears, and every word that comes immediately after it. The next word is selected randomly with a probability proportional to its frequency. Let me illustrate with a quick example:

Simplified Markov chain example, in which the only 3 follow-ups to “taxes” are “bigly,” “soon,” and end of sentence.

Donald Trump says the word “taxes.” If, in real life, 70% of the time after he says “taxes” he follows up with the word “bigly,” the Markov chain will choose the next word to be “bigly” 70% of the time. But sometimes, he doesn’t say “bigly.” Sometimes he ends the sentence, or moves on to a different word. The chain will most likely choose “bigly,” but there’s a chance it’ll go for any of the other available options, thus introducing some variety in our generated text.

And repeat ad nauseam, or until the end of the sentence.

This is great for quick and dirty applications, but it’s easy to see where it can go wrong. As the Markov chain only ever cares about the current word, it can easily be sidetracked. A sentence that started off talking about the domestic economy could just as easily end talking about The Apprentice.

With my limited text data set, most of my Markov chain outputs were nonsensical. But, occasionally there were some flashes of brilliance and hilarity:

Tweet-trained Markov chain given the seed “FBI”

Recurrent Neural Networks

For passably-real text, however, I needed something more sophisticated. Recurrent Neural Networks (RNNs) have established themselves as the architecture of choice for many text or sequence-based applications. The detailed inner workings of RNNs are outside the scope of this post, but a strong (relatively) beginner-friendly introduction may be found here.

The distinguishing feature of these neural units is that they have an internal “memory” of sorts. Word choice and grammar depend heavily on surrounding context, so this “memory” is extremely useful in creating a coherent thought by keeping track of tense, subjects and objects, and so on.

The downside of these types of networks is that they are extraordinarily computationally expensive — on my piddly laptop, running the entirety of my text through the model once would take over an hour, and considering I’d need to do so about 200 times, this was no good.

This is where cloud computing comes in. A number of established tech companies offer cloud services, the largest being Amazon, Google, and Microsoft. On a heavy-GPU computing instance, that one-hour-plus-per-cycle time became ninety seconds, an over 40x reduction in time!

Evaluation

Can you tell if this following statement is real or not?

California finally deserves a great Government to Make America Great Again! #Trump2016

This was text generated off of Trump’s endorsement of the Republican gubernatorial candidate, but it might pass as something that Trump tweeted in the run-up to the 2016 general election.

The more complex neural networks I implemented, with hidden fully-connected layers before and after the recurrent layer, were capable of generating internally-consistent text given any seed of 40 characters or less.

I want them all to get together and I want people that can look at the farms.

China has agreed to buy massive amounts of the world — and stop what a massive American deal.

Less complex networks stumbled a little on consistency, but still captured the tonal feel of President Trump’s speech:

Obama. We’ll have a lot of people that do we — okay? I’ll tell you they were a little bit of it.

Closing Thoughts

While not quite producing text at a level capable of fooling you or me consistently, this attempt opened my eyes to the power of RNNs. In short order, these networks learned spelling, some aspects of grammar, and in some instances, how to use hashtags and hyperlinks — imagine what a better-designed network with more text to learn from, and time to learn might produce.

If you’re interested in looking at the code behind these models, you can find the repository here. And, don’t hesitate to reach out with any questions or feedback you may have!

--

--

I am a data scientist with a background in chemical engineering and biotech. I am also homeless and live in my car, but that's another thing entirely. Hire me!