Member-only story
You Think 80% Means 80%? Why Prediction Probabilities Need a Second Look
Understand the gap between predicted probabilities and real-world outcomes
How reliable are probabilities predicted by a machine learning model? What does a predicted probability of 80% mean? Is it similar to 80% chance of an event occurring? In this beginner friendly post, you’ll learn the basics of prediction probabilities, calibration, and how to interpret these numbers in a practical context. I will show with a demo how you can evaluate and improve these probabilities for better decision-making.
What do prediction probabilities represent?
Instead of calling model.predict(data)
, which gives you a 0 or 1 prediction for a binary classification problem, you might have used model.predict_proba(data)
. This will give you probabilities instead of zeroes and ones. In many data science cases this is useful, because it gives you more insights. But what do these probabilities actually mean?
A predicted probability of 0.8 means that the model is 80% confident that an instance belongs to the positive class. Let’s repeat that: the model is 80% confident that an instance belongs to the positive class. So it doesn’t mean: there is an 80% real-world likelihood of the event occurring…