Confusion Matrices- Not So Confusing!

Somya Maheshwari
Let’s Deploy Data.
3 min readJul 27, 2020
Photo courtesy: https://www.einfochips.com

What is a Confusion Matrix and why do we need it?

Suppose that we have a simple binary classification problem: Given some datasets, our model should indicate whether the person has heart disease or not. Now we train a model and it's time to evaluate it. But how can we evaluate our model’s performance? How effective is our model? Better the effectiveness, better the performance. What is a good metric for doing so? It is where the Confusion matrix comes into play.

Through this blog, I aim to clear all your confusion about “Confusion Matrices”.

Let us first consider what it means for the model to perform well. If the model predicts the person has heart disease and that person actually has heart disease, we would say it performs well. And a similar thing could be said about a person not having heart disease.

To help us think about the problem, we can construct a table that shows all of the possibilities:

Photo courtesy: StatQuest with Josh Starmer (Youtube)

As you can see, the columns here represent the actual class — that is, whether a person actually has heart disease or not. The rows represent the predicted class — that is, whether the model concludes that a person has heart disease or not. When the predicted class matches the actual class (e.g., the model says a person has heart disease and the person does indeed have heart disease, this is a correct classification.

The above table is what you call a Confusion Matrix. It is used for evaluating the performance of a classification model. Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making.

Now we will understand what the above terms mean.

  • True positives- are the positive cases that are correctly predicted as positive by the model
  • False positives are the negative cases that are incorrectly predicted as positive by the model. Also known as the Type 1 error.
  • True negatives are the negative cases that are correctly predicted as negative by the model
  • False negatives are the positive cases that are incorrectly predicted as negative by the model. Also known as the Type 2 error.

Why Do We Need a Confusion Matrix?

Confusion matrices are extremely useful in measuring Recall, Precision, F1 score, Accuracy, and most importantly AUC-ROC Curve.

Photo courtesy: Wikipedia

Recall -

Recall means the proportion of actual positive cases that were correctly identified.

Precision -

Precision means the percentage of your results that are relevant. In other words, precision tells us the proportion of positive cases that were correctly identified.

F1 score -

F1 Score is the weighted average of Precision and Recall. It is the harmonic mean of precision and recall. Its value ranges from 0 (bad) to 1 (good).

Accuracy-

Accuracy is the most intuitive performance measure. It is defined as the ratio of correctly predicted cases by the total number of cases.

For your better understanding, here are the formulas,

Photo courtesy: Lukasz Tracewski (StackOverFlow)

I hope this article was engaging and insightful. Happy reading!

--

--