Confusion Matrix to Certainty

Vivek Kumar Pandey
Analytics Vidhya
Published in
3 min readMay 16, 2021

As analysts or data scientists , whenever we want to look at the performance of our classification models we always measure its accuracy. However, in addition to just measuring the accuracy of the model we also have to depend on various other metrics like its precision, recall etc. To know about these terms in detail, lets try to break these down and understand with a use case.

Currently the world is dealing with a pandemic and people are getting tested worldwide. In various news articles or through reports we hear about the accuracy of these tests and about few results being False positive or False negative. To understand what these terms mean lets look at their definitions

i. True Positive — Scenarios where unfortunately a person is actually positive and is tested positive

ii. True Negative — Scenarios where the person was negative and his test results were also negative

iii. False Positive — Scenarios where a person was actually negative but was tested positive

iv. False Negative — Scenarios where a person was actually positive but his test came out to be negative (most risky!!)

Below is the table to help you understand this visually

Let us assume that in a single day, a particular testing center conducted 1000 tests out of which 10 cases were positives 15 cases were FPs, 25cases were FNs and 950 cases were truly tested negative

Now,

Accuracy = (950+10)/(950+15+25+10) = 96%

Now we say that the testing kit/method is 96% effective, which is a good sign. But we are still missing out on 2.5% cases that are actually spreading the virus and incorrectly isolating 1.5% of the people in the COVID wards which is dangerous for them.

Here is where Precision and Recall of the test (in our case the prediction model) becomes very important.

What is Precision and Recall?

Precision = TP/(TP+FP) = 10/(10+15) = 40% (Total correct positive predictions out of the positive predictions)
This means only 40 out of 100 people should actually be isolated and we are posing danger on the remaining 60 by keeping them in COVID wards.

Recall = TP/(TP+FN) = 10/(10+25) = ~28.6% (Total correct positive predictions out of the actual positives)
This means you are isolating only 29 people out of the 100 positive people and 71 of those 100 are free in the public to spread the virus

Sounds scary right!!

This is the reason why Precision and Recall are equally important metrics to measure the performance of any model and tests.

Conclusion
While evaluating the performance of any of your predictive model apart from the accuracy also make sure your specificity and sensitivity graph (ROC curve) tends to 1, closer the value to 1 better the model

Sensitivity is another word for Recall and,

Specificity is defined as TN/(TN+FP) which is the ratio of truly predicted negatives by the actual negatives

--

--