A Practical Intro to Language Models with Markov Chains

Ashutosh Srivastava
4 min readJul 12, 2020

--

A Markov Chain is a sequence of states. A sequence means that there is always a transition, a moment where a state goes from one state to another. The idea is to generate a sequence of states, based on the existing states and probability of an outcome after that.

Markov chains are one of the most important stochastic processes — the process of some value changing randomly over time. They are called so because they follow the rule of Markov Property, which says that

“the next process in a state depends on how it is right now”

i.e., the information contained in the current state of the process is all that is needed to determine the future states.

It doesn’t have a “memory” of how it was before. It is helpful to think of a Markov chain as evolving through discrete steps in time, although the “step” doesn’t need to have anything to do with time.

A 2 State Markov Model

Consider the above model, with states 0 & 1, and the probability of their transition to another state are represented as: P(0|1): p, P(0|0): (1-p), P(1|0): q, & P(1|1): (1-q)

The Model

Now what we are going to do is to think of text as a sequence of states. and use a character-based language model. These are essentially used to predict the next character at a time. Given a state, we tell the machine to predict the next character.

myText = "the they them .... the them ... the they .. then"

Consider the string myText above, we need to know from the machine the frequency of the next character occurring when the size of the window is selected, which is the number of characters grouped, treated as a whole, that looks for the next character occurrence.

Here, we take the window size, k=3, which gives us the following frequency table

X          y           Frequency
the _ 3
he_ t 3
e_t h 3
_th e 8
the y 2
the n 1
. . .
. . .

Once we retrieve this tabular data, we can now based on the model predict what could be the most likely output, when a certain sequence is encountered based on the probability of the occurrence in a string.

The probability of the next character when a string ‘the’ is encountered.

On the basis of this probability, the next character is selected and our new string is generated.

Markov Chain Algorithm

We are going to perform some lyrics generation from a dataset of Eminem’s popular songs. I found the data from Kaggle consisting of lyrics from most of his popular songs.

We first create a frequency table out of it — One way to create the frequency table in python is to create a dictionary consisting of keys as the string, and it’s value consisting of a nested dictionary key-value pair of the next occurrence and its frequency respectively.

def generateTable(data, k):
T = {}

for i in range(len(data)-k):
X = data[i:i+k]
y = data[i+k]

if T.get(X) is None:
T[X] = {}
T[X][y] = 1
else:
if T[X].get(y) is None:
T[X][y] = 1
else:
T[X][y] +=1
return T

We then store the dictionary with our data and a window size of 6. On can choose any window size, but I chose 6 as the results sounded much like the artist’s original work.

T = generateTable(data.lower(), k=6)

An initial sequence is required for the model to start making predictions. This is the one thing that is user input. Rest, the model will start generating results on its own.

But wait! I found that the output is always the same in every iteration, which made it boring and I wanted to explore other possibilities as well. So, from the dictionary, instead of choosing a character with the highest probability, I let the machine choose the characters at random, and every iteration gave a new result, which was interesting to explore.

Full Code

And this will give random lyrics to the raps Eminem could perform(lol). I tried quite a few times with different data — Harry Potter books, speeches of Narendra Modi and Barack Obama, and the results were exceptionally amazing.

I’ll be doing an advanced neural-based language model as well, which promises for a general, flexible, and powerful approach to language modeling.

--

--