Using Markov to Tweet like Trump — Part 1, the set up

Tweet generation using a second order Markov Chain and 1000+ Trump tweets

When Trump starts tweeting about covfefe, smocking guns and global waming, we’d all be forgiven for assuming his Twitter account is, in fact, a malfunctioning bot. But his distinctive, let’s call it, ‘style’ actually makes him a great source to use for generating text.

Having recently worked with Markov chains as part of a Java course, I was inspired to use that learning and create a Markov chain based Tweet generator in Javascript. In Part 1 of this article, I run through the thought process behind building a text generator, what a Markov chain is, and why they’re involved. In Part 2, I’ll cover how I implemented this in Javascript, and tweaks made along the way.

Word by word

So, to generate text based on an existing source of data (eg. history of tweets, set of novels etc.) where do you start? If you want the generated text to sound similar to the original author, an obvious starting point is to pick words directly from the source text. Would that necessarily make the generated text sound like the author? Not at first, as we’ll see shortly, but it’s a good starting point. If I gave you two short paragraphs, one filled with thee, doth and hast, and another with WALL, fake, and news, it wouldn’t take long to work out which paragraph is based on Shakespeare’s back catalog, and which from Trump’s.

Once upon a… ?

So if we made a first pass at generating text in this way, picking word after word randomly from our data — what would it look like? Let’s use a truncated version of Grimm’s Fairy Tales as the source text for out first attempt at generation:

The for I where the this away merrily eyes at the the on and their you herself closed he spread was the morning the pig sword to bedchamber and said Hansel soundly beneath

The words ‘merrily’, ‘bedchamber’ and ‘Hansel’ sound about right for a Grimm tale, but other than that it’s all nonsense (hopefully anyway… I daren’t ask what a pig sword might look like). The key problem here is that there’s no connection between the words. Each word is pseudo-randomly selected and concatenated onto the end of the existing string.

Some structure needs to be introduced, but we can’t teach the program about verbs, nouns and grammar. Aside from the enormity of such a task, we also want it to use a similar ‘voice’ to our source text, so it doesn’t use just any vocabulary of words.

We need the adjoining words to cohere, so the next step should be to deduce which words are more likely to pair together than others. That should automatically result in a more rational sentence structure. There’s a straightforward way to do this, and that’s with conditional probabilities.

Let’s assume we start off the generated text with the word “Once” (with uppercase “O” — the casing is significant). We want to be more judicious about our next word than simply using random selection. It makes sense to pick a word that can be found directly following “Once” in the source text. To do this we first need to search through the text for all instances of “Once”, and find each word that follows it. In this case the resulting words are: ‘in’, ‘when’, ‘upon’, ‘upon’, ‘upon’ and ‘she’. From this refined selection we can then randomly pick a word and add it to our existing string. As ‘upon’ has more entries than any other word, it’s more likely to be selected. In this case the probability of choosing ‘upon’ is 1 in 2, or 50%. Pretty good odds. So we now have “Once upon”.

Unfortunately this fairy tale example stumbles here, as there’s a much longer list of words that follow “upon”, with ‘the” having the biggest probability of being selected. For example, you’re more likely to end up with “Once upon the castle” than “Once upon a time”:

Once upon the castle gate was so he would be celebrated, and you say! I will think about him, and as spin gold for the palace

Despite not fitting our fairy tale expectations, the text is now sounding more coherent. It still doesn’t really make sense, but verbs and nouns are in suitable places, and we still have notable Grimm elements (castles, and spinning gold).

Introducing the Markov Chain

The model we’ve started to use here is called a Markov Chain. Our ‘chain’ is the string of words and the process used to create this chain is Markovian because each word (or ’state’) is selected randomly, and each new state is conditional on the current state only. For example, when the program picks the third word “the” in the text “Once upon the castle gate”, it doesn’t care that the word two states back was “Once”, it only uses the current state of “upon” to generate the next state. After it moves on to “the” as the current state, “upon” is forgotten. It’s a memoryless process.

These discrete steps from one state to another are called ‘transitions’, and conditional probabilities can be calculated for each possible outcome. The transition diagram below illustrates moving from ‘Once’ to the next state.

Transition diagram for the state “Once”, using Grimm’s Fairy Tales as source text. “upon” has the highest probability of being the next chosen word.

Markov Typewriter and orders

Markov chains can be implemented with a wide variety of data, however we need only go one small step to explore the idea of a Markov Typewriter… and discover its limitations. Instead of words, what if we want to generate our text letter by letter? We can use the same process as before, but our ‘states’ are now letters instead of words.

Imagine a monkey and, as often comes with imaginary monkeys, a typewriter. The monkey represents the random process — jabbing away at the keyboard. The typewriter itself has the ability to update its keyboard layout using a weighted distribution. With each key press, it updates the keys to offer only letters that exist in English language words after the letter just pressed. If you start with a vowel, the QWERTY layout probably won’t change much. Start with a Q though? The vast majority of English words that start with a Q, are followed by a U:

Typewriter using a second order Markov chain — letter proportions are representative only

Generating words letter by letter, using the same Grimm source text, doesn’t seem to succeed as well as the word chain however:

The beanganendyangem, chitreaghe dono d tlouteald y s-danels hen wan er cado anoril

The frequency of vowels following consonants, and vice versa, signals that the letter pairing is working as expected, but selecting a new letter based on only the previous letter is showing limitations. So, what if we try using the previous two letters? For example, if you use “QU” to find your next letter, you’d have a much smaller range of letters to choose from, than if you just looked for those typically following a “U”. Using Grimm again, here’s our result checking for two letters:

I has might we wenterce werted; all maid his dog, ned sits know

Now we start to recognise a few English words… What if we bump it up to four letters?:

I had not been carried though the window a many many, many tears hand the kind the faithful prince and at the cook throw and done.

Not only are most of the words now recognisably English, but it’s also starting to pick up the author’s style.

What we’re doing here is changing the order of the Markov chain. Our initial, memoryless Markov chain can also be called a first order chain. It only uses the current state to get the next state. Higher order chains look back at previous states. For example, a chain with order 2 (or second order chain) looks at the current state AND the previous state; a third order looks at the current state AND the last two previous states. Let’s look at the “Once upon” possible transitions again, but using an order of 2:

Using order 2, it no longer reaches state “a” and has a huge range of words to choose from. Instead it uses the current state, “a”, and last state,“upon”, together to find a following word. It’s much more likely to result in that typical fairy tale beginning.

Using a second order, rather than first order chain, let’s return to generating using words rather than letters:

The day came that had been enchanted by a great shout, making all the wealth of the lake but this time the princess was not at home

The Javascript build

Now we have a basic idea of how the text generator will use Markov chains, and how they work, we can start building this functionality. I’m separating that out into a Part 2 article, coming next week.

In the meantime, you can click here to see the Trump Tweet Generator in action.




Software developer, and learning enthusiast

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Week 10: Some interesting results in our approach

Notes on Deep Learning — Linear regression in PyTorch way

Retina Blood Vessel Segmentation using VGG16-UNET

Control the training of your neural network in Tensorflow with callbacks

Portfolio-Scale Machine Learning at Zynga

Laplace Smoothing and the Sunrise Problem

Person watching sunrise over mountain ridge.

Leveraging AI Support Vector Machines (SVM) For Autonomous Cars

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Emma Williams

Emma Williams

Software developer, and learning enthusiast

More from Medium

Brick by boring brick

Doing the right thing for people and planet

A Jumping Jacks Exercise Trainer

Knowing and longing to be known