Markov Chain Sentences

Letting Computers Speak For Themselves

6 min readDec 3, 2017

What is a Markov chain?

If we’re going to talk about Markov chains, it’s probably helpful to know what they are. Named for mathematician Andrey Markov, they are a collection of random variables indexed in some way (usually by time), that change from state to state in a way that is dependent solely on the current state, and not past events.

That’s a lot of abstract concepts to absorb. It’s probably easiest to think about with some concrete examples, the first of which is borrowed from our old friend, Wikipedia.

Imagine a game where you begin with $10 and flip a coin, winning $1 on heads, and losing $1 on tails until your money is gone. After any number of flips we can look at your current pool of money — say, $8– and make a prediction of the next state of your funds — either $7 or $9– with a 50% chance of being correct. No amount of knowledge of prior flips or the starting $10 affect our odds of guessing correctly, making this a Markov chain.

Alternatively, let’s say you’re pulling socks from your dresser drawer one at a time, since no one has the energy to find each sock a mate after doing laundry. You set aside each sock you pull out, hoping to find a matching pair (you regrettably bought a few different colors). After several minutes of sock-pulls, your ability to predict the next sock is influenced greatly by the prior pulls, making this a non-Markov chain. Say you have one green pair, two blue pairs, and nine red pairs, and you look at the ground and see a green sock, two blue socks and no red socks. You can say with some confidence you’re going to pull a red sock next.

Who cares?

So why is this post titled Markov chain sentences? Sentences aren’t coin flips or socks. As I said before, these chains are indexed lists of variables, and in a sentence’s case, the variables are words, indexed by their place in the sentence.

The probability of the word’s appearance — like the 50/50 odds of the coin flip — is dependent on something called n-grams. An n-gram is a grouping of n words as they appear together in a source text. In this article we’ll be talking about bigrams, pairings of two words. The sentence “stupid is as stupid does.” Has bigrams of (stupid is), (is as), (as stupid), and (stupid does.)

Bigrams are used to predict the next word in a sentence based on a selected word. If we start with “stupid” we can see there are two options, (stupid is) and (stupid does.). There is a 50% chance of either since we don’t know where we are in the sentence. If we pick “is”, then we check any bigrams that start with “is”, and we know with 100% certainty the next word is “as”, because there is only one bigram. We end up at (as stupid), and make the same 50/50 guess as before. If we went with (stupid does.) that would end our sentence, leaving us with “stupid does.” as the whole phrase.

You see this behavior a lot with predictive text. If you’ve ever just clicked the suggested word on your phone’s keyboard you can string together some crazy sentences. Here’s an example from my phone:

I’m gonna try to watch boss baby girl who was running the ad in my feed because of the soft bed and I’m excited to see me and I was wondering if you are getting loud notifications of copyright law that the Thursday night football song has to be a horrible experience.

Not a terrible sentence! The Thursday night football song is a horrible experience. Please ignore the part where I’ve said the phrase “Boss Baby” enough for my phone to suggest it for me. The phone is drawing from the pool of everything you’ve ever typed and pairing words together in a way to save you a little time.

So what would something like this look like, behind the scenes? I’ve written a simple version in Ruby to emulate this behavior.

The Code:

See the whole thing HERE

What is this doing, exactly?

If you know Ruby and want to test this out yourself, click the link below the code to see the GitHub repository. It contains additional paragraph and essay methods that run the sentence method multiple times.

Lets take a quick look at each of the functions.

initialize — We initialize our chain with an array of every word from the inputted text in order. Additionally, we use #find_enders to grab all words that end sentences so we can cheat a little to avoid ending our sentences on words like “the”.

make_sentence —This method calls upon #find_next a few times, and then takes all the words returned from that, capitalizes the first, and appends one of the words from our enders array with a period.

find_next — Most of the heavy lifting is done here. We pass in a word, then run through our array of all source words, and everywhere we find a match we return the word immediately following our input. This gives us a collection of words that should make sense to follow the original. If nothing is found ( either there wasn’t a word passed in, or the document ended after the input word) it will return a random word from the source file.

But wait, shouldn’t you–

Yes, there are a lot of opportunities to improve this! We should be making sure our words are all in the same sentence. As-is, the code ignores periods when creating bigrams, leaving some strange transitions. We are also faking our bigrams a bit, as they could exist as a comprehensive dictionary to draw from. The sentence ending method is a bit ham-fisted.

This could also extend to larger groupings beyond bigrams — our sentences all come out quite nonsensical, and additional context in the form of trigrams could help words find more natural homes.

There are a hundred other things we could change. What matters most, though, is we have a baseline to work with. Let’s see it in action.

Making Sentences

Let’s set up a new chain and have it generate a sentence with the following snippet:

chain = MarkovChain.new(filepath)
chain.make_sentence

For a source file, let’s input the first set of paragraphs from this article and see what we get.

“Since no red sock next sock two blue pairs and not past events thats a green laundry.”

You can see why we’d want to add some updates. A quick scan and it might make sense, but it’s rather incoherent. Luckily we avoided any mentions of the Boss Baby.

A True Expert

In the process of writing this post I’ve begun to wonder– with all this research, why am I the one writing the post if I’m still learning? There are people who actually know what they’re talking about out there. So I thought, why not let them all write something together? With the help of my program, some of the brightest researchers of Markov chains have combined their writings. The problem with academic writing is that it’s a bit dry. To inject some levity and more flowy prose, I mixed in the text from a ClickHole article about George R. R. Martin’s history with horses while writing Game of Thrones.

Click here for enlightenment.

Sources for generated article:

ClickHole: When I Started Writing ‘Game Of Thrones,’ I Didn’t Know What Horses Looked Like

Research Papers: 1, 2, 3, 4, 5