Let's make the “Confusion matrix” less confusing!!

Published in

Analytics Vidhya

5 min readSep 3, 2020

“Doubts are good. Confusion is excellent. Questions are awesome.
All these are attempts to expand the wisdom of mind.”
― Manoj Arora

In a classification problem, it is often important to specify the performance assessment. This can be valuable when the cost of different misclassifications varies significantly. Classification accuracy is also a measure showing how well the classifier correctly identifies the objects.

A confusion matrix also called a contingency table or error matrix gets across the picture when it comes to visualizing the performance of a classifier. The columns of the matrix represent the instances of the predicted classes and the rows represent the instances of the actual class. (Note: It can be the other way around as well.)

The confusion matrix shows the ways in which your classification model is confused when it makes predictions.

Confusion Metrix

Confusion matrix consists of Predicted and actual value

In this confusion matrix, the green background is the “correct” cells :

TRUE NEGATIVE (TN): The values that are predicted by the classifiers are false those are actually false.
TRUE POSITIVE (TP): The values that are predicted by the classifiers are true those are actually true.

And the red background is the “error” cells :

FALSE NEGATIVE (FN): The values that are predicted by the classifiers are false those are actually true. This is also called TYPE II error.
FALSE POSITIVE (FP): The values that are predicted by the classifiers are true those are actually false. This is also called TYPE I error.

What is the Need of a Confusion Matrix?

let us consider an example of the COVID-19 virus, let’s say you want to predict how many people are infected with the virus in times before they show the symptoms, and isolate them from the healthy population. The two values for our target variable would be COVID positive and not COVID positive.

You might be thinking why do we need a confusion matrix when we have the accuracy to check our results. Let's check our accuracy 1st!!

There are 1000 data points for the negative class and 30 data points for the positive class. This is how we’ll calculate the accuracy:

The total outcome values are:

TP = 20, TN = 950, FP = 20, FN = 10

So, the accuracy of our model turns out to be:

Here our accuracy is 97%, which is not bad! But it is giving the wrong idea about the result.

Our model is saying “It can predict COVID positive people 97% of the time”. However, it is doing the opposite. It is predicting the people who will not COVID positive with 97% accuracy while the COVID positive are spreading the virus!

Do you think this is the correct way of measuring our result ??? Shouldn’t we be measuring how many positive cases we can predict correctly to arrest the spread of the contagious COVID? Or maybe, out of the correctly predicted cases, how many are positive cases to check the reliability of our model?

This is where we come across the dual concept of Precision and Recall.

Precision vs. Recall

Precision

Precision tells us what proportion of patients we diagnosed as having the virus actually had the virus.

Precision is calculated by:

Recall

Recall tells us what proportion of patients that actually had virus were predicted by us as having virus.It should be high as possible.

Recall is calculated by:

Note: Recall tells you how much of the +ve’s you can find.
Precision tells you how much junk there is in your predicted +ve’s.

F1 score

In practice, when we try to increase the precision of our model, the recall goes down, and vice-versa. The F1-score captures both the trends in a single value.

F1 Score= 2*(Precision*recall / precision + recall)