Intro to Markov Chains
Part one.
‘Markov processes are used for general, of Markov chain monte carlo which, could knowing the theory of. Example it is based solely, exchange rates of years earlier. if one can make predictions and related fields a process.’
— “What?”
Oh sorry, I put the academic definition of ‘Markov Chains’ through a Markov generator I made. That is just one output, but since it’s based on some randomization, it can do many more variations.
— “So, I guess Markov chains just chop up sentences into hard to read jumbles of words?”
Yes, but mainly no. We chop with intent!
In this article I will highly, and unabashedly simplify the theories of Andrey Markov and other statisticians that work with stochastic sequences. I hope this will better align your thoughts on the subject of Markov Chains, or hopefully give you some thoughts to gather this is all new. We will focus our time in this article to look at simply finding patterns in a text and randomly generating new ones using these found connections.
— “And what do you mean by patterns and connections?”
IT’S IN YOUR QUESTION!
let pattern = { And: ‘what’, what: ‘do’, do: ‘you’, you: ‘mean’, mean: ‘by’, by: ‘patterns’, patterns: ‘and’, and: ‘connections’, connections: null,};
This is the pattern of all the words with their following words. We would be done now except for all the other things we need to look into. So, ya know, let’s keep going.
I notice that our ‘pattern’ object has some issues. For example, “and” shows up twice. We should tidy this up, because we only care about unique words and what words follow them. No word should show up more than once on the left side, or in our object’s keys. You could say each unique word should having a branch structure that connects these follower-words we have found. Think of these branches of ‘follower-words’ as options for the the unique word to choose from. When creating a sentence our function can simply look up what should come next, treating multiple options as an opportunity to pick randomly. The beauty of a Markov chain is that each word in the sentence sets up the next to be used, and so on till you tell it to stop. Once a word is place in the beginning of our sentence, the series of events can be kicked off. But we first need to have a good set of patterns picked out.
As a small side note, we should also normalize the uppercase letters to lowercase because they may not start our sentence. We can address this when we construct a sentence based off out list of connections later.
let betterPattern = { And: [‘what’, ‘connections’], what: ‘do’, do: ‘I’, I: ‘mean’, mean: ‘by’, by: ‘patterns’, patterns: ‘and’, connections: null,};
Now we can pick a random word from the keys and start building a sentence based on a random ‘follower-word’. Let’s do it.
‘and connections’
No wait, let’s try again.
‘by patterns and connections’
— “Hrmmm, cool I guess. Is this as interesting as it gets?”
No, but this is pretty much the bottom of the barrel for cool Markov chains, so buck up.
What can we do to make this better?
The problem you may be seeing is that the size of the sentence, also known as “The Corpus” is too small and doesn’t allow enough instances of unique words to have a lot of follower-words. Good news, we have identified that being a problem and its an easy fix. Make a bigger corpus! Except, I found it annoying to type out this simple sentence’s connections, so there is no way I am doing this with thousands of words. We should switch over to a computer to do our work for us now that we know the rules. We have simplified the problems in our understanding to better know what goes into building these Markov objects. We can be lazy now. Let’s code.
— “We did it! I think. But, what does this look like with a slightly better input corpus?”
Easy now, and let me explain while also answering your question…
This is an iterative, imperative approach to creating our Markov ‘corpus’ object. The output for these two sentences you have just read is:
const patternFromAboveSentences = { this: [‘is’], is: [‘an’], an: [‘iterative,’], iterative: [‘imperative’], imperative: [‘approach’], approach: [‘to’], to: [‘creating’], creating: [‘out’], out: [‘markov’], markov: [‘pattern’], pattern: [‘object’], object: [‘the’], the: [‘output’], output: [‘for’], for: [‘these’], these: [‘two’], two: [‘sentences’], sentences: [‘you’], you: [‘have’], have: [‘just’], just: [‘read’], read: [‘are’],};
Or…
const patternFromGettysburgAddress = { fourscore: [ ‘and’ ], and: [ ‘seven’, ‘dedicated’, ‘so’, ‘proper’ ], seven: [ ‘years’ ], years: [ ‘ago’ ], ago: [ ‘our’ ], our: [ ‘fathers’ ], fathers: [ ‘brought’ ], brought: [ ‘forth,’ ], forth: [ ‘on’ ], on: [ ‘this’, ‘a’ ], this: [ ‘continent,’ ], continent: [ ‘a’ ], a: [ ‘new’, ‘great’, ‘great’, ‘portion’, ‘final’ ], new: [ ‘nation,’ ], nation: [ ‘conceived’, ‘or’ ], conceived: [ ‘in’ ], in: [ ‘liberty,’, ‘a’ ], liberty: [ ‘and’ ], dedicated: [ ‘to’ ], to: [ ‘the’, ‘dedicate’ ], the: [ ‘proposition’ ], proposition: [ ‘that’ ], that: [ ‘all’, ‘nation,’, ‘war.’, ‘field,’, ‘that’, ‘nation’, ‘we’ ], all: [ ‘men’ ], men: [ ‘are’ ], are: [ ‘created’, ‘engaged’, ‘met’ ], created: [ ‘equal.’ ], equal: [ ‘now’ ], now: [ ‘we’ ], we: [ ‘are’, ‘are’, ‘have’, ‘should’ ], engaged: [ ‘in’ ], great: [ ‘civil’, ‘battle-field’ ], civil: [ ‘war,’ ], war: [ ‘testing’ ], testing: [ ‘whether’ ], whether: [ ‘that’ ], or: [ ‘any’ ], any: [ ‘nation’ ], nation: [ ‘so’, ‘might’ ], so: [ conceived, ‘dedicated,’ ], conceived: [ ‘and’ ], dedicated: [ ‘can’ ], can: [ ‘long’ ], long: [ ‘endure.’ ], endure: [ ‘we’ ], met: [ ‘on’ ], battle-field: [ ‘of’ ], of: [ ‘that’, ‘that’ ], war: [ ‘we’ ], have: [ ‘come’ ], come: [ ‘to’ ], dedicate: [ ‘a’ ], portion: [ ‘of’ ], field: [ ‘as’ ], as: [ ‘a’ ], final: [ ‘resting-place’ ], resting-place: [ ‘for’ ], for: [ ‘those’ ], those: [ ‘who’ ], who: [ ‘here’ ], here: [ ‘gave’ ], gave: [ ‘their’ ], their: [ ‘lives,’ ], lives: [ ‘that’ ], might: [ ‘live.’ ], live: [ ‘it’ ], it: [ ‘is’ ], is: [ ‘altogether’ ], altogether: [ ‘fitting’ ], fitting: [ ‘and’ ], proper: [ ‘that’ ], should: [ ‘do’ ], do: [ ‘this’ ]};
Now you can see how we are scraping through a large piece of text to condense it into it’s “word following” pattern. We are preserving the natural way someone has composed words, meaning the computer will use them in a more meaningful way. For instance, the partial sentence structure of “Peanut butter and” allows you to obviously come up with the next word. There is a pattern we recognize. If we didn’t follow the patterns of a corpus, or words used naturally, we would get a completely random set of words that has almost zero speech-flow. That is what makes text based Markov chains so interesting. We can computationally analyze how words have been used in speech to synthesize brand new texts. Markov generated sentences can sound intelligently designed, even though it is ultimately just stochastically decided using code and a source text.
— “But is this all you can do?”
Oh you’re still here, I thought I lost you. To answer your question, no. Use your imagination with this stuff.
Thinking around this idea offers opportunities to do all sorts of things. A common thing is to mix texts to possibly come up with fun new generated texts. For example, a food article and Prince lyrics could be combined into a Markov generator and who knows. The possibilities are many with ‘Markovian’ principles.
Now go forth and implement these objects to speech for you using randomization and code!
“Drawer was beautiful historical artistically subtle , their entire life and much , for me to buy food with , was pretty scary at birth , miles across town every drawer , the best decisions i could stop , later found out to college that , and much of the time offered” — Food article written with Prince’s “Purple Rain”
For an example of a Markov Poem Generator I made, featured in this article, visit zacharyad.github.io. For the code, visit my (in the works, sorry) Jim Carry based lorem ipsum CLI generator NPM package, visit github.com/zacharyad/jimcarry-markov-ipsum.