Member-only story
What does Entropy Measure? An Intuitive Explanation
Entropy may seem abstract, but it has an intuitive side: as the probability of seeing certain patterns in data. Here’s how it works.
In data science, there are many concepts linked to the notion of entropy. The most basic one is Shannon’s information entropy, defined for any distribution, P(x), through the formula:
Where the sum is over all the possible categories in C.
There are other related concepts that have similarly looking formulae:
- Kullback–Leibler divergence: for comparing two distributions
- Mutual information: for capturing general relationships between two variables
- Cross entropy: for training classification models
Despite the ubiquity of entropy-like formulae, there are rarely discussions on the intuitions behind the formula: Why is the logarithm involved? Why are we multiplying P(x) and log P(x)? While many articles mention terms like “information”, “expected surprise”, the intuitions behind them are missing.
It turns out, just like probabilities, entropy can be understood through a counting exercise, and it can be linked to a sort of log-likelihood for distributions. Furthermore, this counting can be linked to the literal number of bytes in a computer. These…