Confusion Matrix

Vanshita
5 min readJun 6, 2021

--

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model.

For a binary classification problem, we would have a 2 x 2 matrix as shown below with 4 values:

Let’s decipher the matrix:

The target variable has two values: Positive or Negative

The columns represent the actual values of the target variable

The rows represent the predicted values of the target variable

But wait – what’s TP, FP, FN and TN here? That’s the crucial part of a confusion matrix. Let’s understand each term below.

Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix

True Positive (TP)

The predicted value matches the actual value

The actual value was positive and the model predicted a positive value

True Negative (TN)

The predicted value matches the actual value

The actual value was negative and the model predicted a negative value

False Positive (FP) – Type 1 error

The predicted value was falsely predicted

The actual value was negative but the model predicted a positive value

Also known as the Type 1 error

False Negative (FN) – Type 2 error

The predicted value was falsely predicted

The actual value was positive but the model predicted a negative value

Also known as the Type 2 error

Let me give you an example to better understand this. Suppose we had a classification dataset with 1000 data points. We fit a classifier on it and get the below confusion matrix:

The different values of the Confusion matrix would be as follows:

True Positive (TP) = 560; meaning 560 positive class data points were correctly classified by the model

True Negative (TN) = 330; meaning 330 negative class data points were correctly classified by the model

False Positive (FP) = 60; meaning 60 negative class data points were incorrectly classified as belonging to the positive class by the model

False Negative (FN) = 50; meaning 50 positive class data points were incorrectly classified as belonging to the negative class by the model

This turned out to be a pretty decent classifier for our dataset considering the relatively larger number of true positive and true negative values.
Why Do We Need a Confusion Matrix?

Before we answer this question, let’s think about a hypothetical classification problem.
Let’s say you want to predict how many people are infected with a contagious virus in times before they show the symptoms, and isolate them from the healthy population (ringing any bells, yet? 😷). The two values for our target variable would be: Sick and Not Sick.
Now, you must be wondering – why do we need a confusion matrix when we have our all-weather friend – Accuracy? Well, let’s see where accuracy falters.
Our dataset is an example of an imbalanced dataset. There are 947 data points for the negative class and 3 data points for the positive class. This is how we’ll calculate the accuracy:

The total outcome values are:
TP = 30, TN = 930, FP = 30, FN = 10
So, the accuracy for our model turns out to be:

But it is giving the wrong idea about the result. Think about it.
Our model is saying “I can predict sick people 96% of the time”. However, it is doing the opposite. It is predicting the people who will not get sick with 96% accuracy while the sick are spreading the virus!
Do you think this is a correct metric for our model given the seriousness of the issue? Shouldn’t we be measuring how many positive cases we can predict correctly to arrest the spread of the contagious virus? Or maybe, out of the correctly predicted cases, how many are positive cases to check the reliability of our model?
This is where we come across the dual concept of Precision and Recall.

Precision vs. Recall

Precision

Precision tells us how many of the correctly predicted cases actually turned out to be positive.

Here’s how to calculate Precision:

This would determine whether our model is reliable or not.

Recalls

Recalls tells us how many of the actual positive cases we were able to predict correctly with our model.

And here’s how we can calculate Recall:

We can easily calculate Precision and Recall for our model by plugging in the values into the above questions:

Precision is useful metric in cases where false .

Positive is higher concern than false negatives.

Precision is important in music or video recommendation systems, e-commerce websites, etc. Wrong results could lead to customer churn and be harmful to the business.

Recall is usefull metric in cases where false negative trumps false positive.

Recall is important in medical cases where it doesn’t matter whether we raise a false alarm but the actual positive cases should not go undetected!
In our example, Recall would be a better metric because we don’t want to accidentally discharge an infected person and let them mix with the healthy population thereby spreading the contagious virus. Now you can understand why accuracy was a bad metric for our model.
But there will be cases where there is no clear distinction between whether Precision is more important or Recall. What should we do in those cases? We combine them!

--

--