Deconfusing the Confusion Matrix

Stuti Singh
Analytics Vidhya
Published in
3 min readJun 15, 2020

Confusion matrix is a way to view different classification metrics as accuracy, precision, recall, etc. I will try to explain the confusion matrix first and further we will learn sensitivity, specificity, accuracy and other model evaluation metrics.

In classification problem after testing, we get results in two categories:

True condition

Predicted condition.

For example, if we have disease diagnostic test for machine learning. So the machine learning model will predict in two classes.

Let’s make the following definitions:

  • “Disease” is a positive class.
  • “Normal” is a negative class

We can summarise our “Disease-prediction” model using a 2x2 confusion matrix that depicts all four possible outcomes:

For this scenario the confusion matrix will look like.

A True Positive(TP) — when model correctly predicts the positive class(having disease).

A True Negative (TN)— when model correctly predicts the negative class(No Disease-Normal).

A False positive (FP)— when model incorrectly predicts the positive class. It is also known as Type l error.

A False negative(FN) — when model incorrectly predicts the negative class. It is also known as Type ll error.

Evaluation Metrics

  1. Accuracy

It can be measured as ratio of correctly predicted observation to total no. of observations.

Accuracy = Correctly predicted samples / Total no of samples

For example if we 10 set of test samples with Disease(2 cases)and Normal(8 cases).

Green color shows Normal(No Disease) and red is for Positive (Having Disease). In this scenario model 1 and model 2 both has same accuracy as follows because correctly predicted samples(TP+TN) are same in both models.

Accuracy= Correctly predicted samples / Total no of samples

=TP + TN/(TP+FP+TN+FN)

=8/10=0.8

We can get a sense that model 2 is perhaps doing something more useful than model 1 because it is at least attempting to distinguish between healthy and disease patients.

Therefore, we should not always assume that model of high accuracy is the best. Accuracy is a great measure but only when you have symmetric datasets where values of false positive and false negatives are almost same. So, if you have asymmetric dataset, we should look at other parameters to evaluate the performance of our model.

2. Sensitivity(Probability of Detection)

Also knowns as Recall or True positive rate(TPR). It is the ratio of correctly predicted positive observations to the all observations in True class — positive. For disease prediction model we can define it, if patient has disease what is the probability that model predicts positive.

Senstivity =Σ True positive/Σ Condition positive

=TP/TP+FN

3. Precision

Also known as positive predictive value(PPV). It is depicted by a ratio of correctly predicted positive observations to the total predicted positive observations. The question this metric answer is if a model prediction is positive what is the probability that a patient has disease.

Precision=Σ True positive/Σ Prediction positive

=TP/ TP+ FP

4. F1 score

F1 Score is the harmonic mean of Precision and Recall. If you have an uneven class distribution, F1 is usually more helpful than accuracy.

F1 score=2*Recall*Precision/(Recall+Precision)

5. Specificity

This metric answers the question “if a patient is normal what is the probability that the model predicts negative”. It is also known as Selectivity or True Negative Rate(TNR).

Specificity=Σ True Negative / Σ Condition negative

References:

  1. https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/
  2. https://www.coursera.org/learn/ai-for-medical-diagnosis
  3. https://en.wikipedia.org/wiki/F1_score

--

--