language model and perplexity

2 min readOct 20, 2023

In the previous articles 1 2, we learned how to calculate probability of a given sentence using n-gram language model. Today, let’s discuss how we calculate perplexity of a corpus using a language model. To be able to follow along, it is recommended that you go through the exercises in the previous articles.

Perplexity is the standard metric for measuring quality of a language model. Qualitatively, perplexity measures the average branching factor per token predicted by the language model. Let’s take a look at two extreme ends of the metric.

perplexity of 1: the language model is 100% certain predicting the next token. This occurs if language model is severely over-fit to the evaluation corpus. In practice, this should never happen.
perplexity of V where V is the size of the vocabulary: the language model assumes uniform distribution, i.e., completely random guess. If this is what we get from the language model, we might as well roll a die to predict the next token.

So, perplexity should lie somewhere between 1 and V. A better language model will show lower perplexity in general.

Quantitatively, perplexity is given by

where q(s) is the probability of sentence s, n is the number of sentences, and N is the number of tokens in the corpus. Let’s calculate perplexity of our 2-gram model from before using two evaluation sentences. Create eval.txt file with the following

that is not the question
that is that

language model and perplexity

Written by TechHara