What is Confusion Matrix

Praks
3 min readJul 21, 2021

--

In this article I am going to explain what is confusion matrix and how to interpret this.

Confusion Matrix is a classification performance matrix, which actually works on actual predicted values. Confusion Matrix does not work with probability score, hence in order to create confusion matrix you need actual label values.

Let’s consider a case where we have total 100 data points; say n=100. Out of 100 data points, 50 points belong to label “True”, say P = 50 and 50 points belong to label “False”, say N = 50.

Assume that we have created a model which has predicted values for all these 100 points and here is the confusion matrix for the same.

Confusion Matrix

Here there are lot of info, let’s interpret one by one:

  • True Negative (TN): True negative are those data points where actual label is “False” and model has also predicted label “False”. Means model has correctly classified these data points which are belonging to negative class.
  • True Positive (TP): True positive are those data points where actual label is “True” and model has also predicted label “True”. Means model has correctly classified these data points which are belonging to positive class.
  • False Negative: False negative is defined as where the actual data point label is “True” but model has predicted its label as “False”. Thus missing out identifying real cases
  • False Positive: False positive is defined as where the actual data point label is “False” but model has predicted its label as “True”. Thus giving false alarm.

Since the value of True Negative and True Positive is high in this case we can consider this as good model.

A good model generally will have high TP and TN values (the diagonal values).

Let’s now look at few more concepts; TPR, TNR, FPR, FNR

  • True Positive Rate (TPR): This is the ratio of True Positive (TP) and total positive (P)
Here in this case, TPR = 40/(10+40) = 40/50 = 0.8
  • True Negative Rate (TNR): This is the ratio of True Negative (TNR) and total negative (N)
Here in this case TNR = 30/(30+20) = 30/50 = 0.6
  • False Positive Rate (FPR): This is the ratio of False positive and total negative (N)
Here in this case FPR = 20/30+20 = 20/50 = 0.4
  • False Negative Rate (FNR): This is the ratio of False negative and total positive (P)
Here in this case FNR = 10/10+40 = 10/50 = 0.2

Based on all the above rates, we can see TPR and TNR are higher and FPR and FNR is lower which is a good sign of model being good.

Though we have balanced data the model is not doing good for negative data points as compared to positive data points as TNR is lower than the TPR in ratio.

In case of multiclass classification also we should try to obtain high value of TP and TN, means all the Principal Diagonal Element should have high value and all the Off Diagonal Element should have low values.

Conclusion: In order to identify how good our model is we can get all these four rates TPR, TNR, FPR, FNR and should try to obtain higher TPR and TNR. FPR and FNR should be lower, however this also depends on different domain area too.

Thank you for your time for reading! :)

--

--