Confusion Matrices- Not So Confusing!

Published in

Let’s Deploy Data.

3 min readJul 27, 2020

Photo courtesy: https://www.einfochips.com

What is a Confusion Matrix and why do we need it?

Suppose that we have a simple binary classification problem: Given some datasets, our model should indicate whether the person has heart disease or not. Now we train a model and it's time to evaluate it. But how can we evaluate our model’s performance? How effective is our model? Better the effectiveness, better the performance. What is a good metric for doing so? It is where the Confusion matrix comes into play.

Through this blog, I aim to clear all your confusion about “Confusion Matrices”.

Let us first consider what it means for the model to perform well. If the model predicts the person has heart disease and that person actually has heart disease, we would say it performs well. And a similar thing could be said about a person not having heart disease.

To help us think about the problem, we can construct a table that shows all of the possibilities:

Photo courtesy: StatQuest with Josh Starmer (Youtube)

As you can see, the columns here represent the actual class — that is, whether a person actually has heart disease or not. The rows represent the predicted class — that is, whether the model concludes that a person has heart disease or not. When the predicted class matches the actual class (e.g., the model says a person has heart disease and the person does indeed have heart disease, this is a correct classification.

The above table is what you call a Confusion Matrix. It is used for evaluating the performance of a classification model. Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making.

Now we will understand what the above terms mean.

True positives- are the positive cases that are correctly predicted as positive by the model
False positives are the negative cases that are incorrectly predicted as positive by the model. Also known as the Type 1 error.
True negatives are the negative cases that are correctly predicted as negative by the model
False negatives are the positive cases that are incorrectly predicted as negative by the model. Also known as the Type 2 error.

Why Do We Need a Confusion Matrix?

Confusion matrices are extremely useful in measuring Recall, Precision, F1 score, Accuracy, and most importantly AUC-ROC Curve.

Recall -

Recall means the proportion of actual positive cases that were correctly identified.

Precision -

Precision means the percentage of your results that are relevant. In other words, precision tells us the proportion of positive cases that were correctly identified.