Member-only story

Perplexity Intuition (and its derivation)

Never be perplexed again by perplexity.

Aerin Kim
Towards Data Science
4 min readOct 11, 2018

--

You might have seen something like this in an NLP class:

A slide from Dr. Luke Zettlemoyer’s NLP class

Or

A slide of CS 124 at Stanford (Dr. Dan Jurafsky)

During the class, we don’t really spend time to derive the perplexity. Maybe perplexity is a basic concept that you probably already know? This post is for those who don’t.

In general, perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.

But why is perplexity in NLP defined the way it is?

If you look up the perplexity of a discrete probability distribution in Wikipedia:

where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events.

In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Aerin Kim
Aerin Kim

Responses (11)