Let’s clear the confusion

Vinnu Chitturi
Analytics Vidhya
Published in
6 min readMar 18, 2021

--

Evaluating model’s performance plays a crucial role while finalizing the model or deciding whether it is good to be deployed in production environment. We have a good number of ways to evaluate performance but among them, confusion matrix and ROC curve are widely chosen metrics.

It is either because of its name or the way beginners interpret it, somehow confusion matrix looks confusing and does not get registered in mind that easily. In this article, let us try to get hold of confusion matrix with a simple example.

Let us consider that we have a classification task at hand, in which we have to find squares out of a group which has both circles and squares.

In the below example, we have a total of fifteen objects, out of which nine are squares and six are circles. If we have to identify squares manually, more or less we end up in hundred percent accuracy. But if we want this task to be handled by a machine, achieving hundred percent accuracy is difficult except when it is well trained. We can assume machine to be a tender baby who has just started identifying objects, the more we train the baby, the better it identifies.

Now, when we ask machine to pick all squares and put them in a new container, assume that it has given the below result.

Here, we can see that out of Nine squares, machine has picked only six squares, the other three are circles. If someone asks, how is our machine doing, it would be difficult whether to concentrate on unwanted picks among the picked or to emphasize on wanted picks in the left-out set. Here comes, the role of confusion matrix.

Confusion matrix is nothing but a table which helps us describe the performance of a model.

In General, we plot confusion matrix after predicting the target label based on X_test Data. Since we know the actual Y_Test, We can compare between actual Ytest and Predicted Y_test.

Let us try to understand this terminology with above square and circle dataset.

True Positives(TP): How many squares picked are actually Squares?

False Positives(FP): How many squares picked are not Squares?

False Negatives(FN): How many actual squares are not picked?

True Negatives(TN): How many non-Squares(Circles) are not picked?

If we answer the above questions, our confusion matrix looks as follows.

Now, let us see what are Sensitivity and Precision.

Sensitivity is also known as Recall, it says proportion of correctly classified positive results among all positives. In our case, it would be total number of squares picked from the total number of available squares.

So, if we plug our values in this formula, we would see sensitivity as 6/9 which is 2/3.

Precision is about proportion of true positives among all predictions. In our case, it would be total number of squares picked among all of the picked items.

So, if we plug the values we see a value of 6/9 which is again 2/3.

We always want Sensitivity and Recall values to be higher and these values lie in the range of 0 and 1. If we observe the above equations, numerator is True Positive value in both of them. The only thing that differs is second part of the denominator which is False Negative in First case and False Positive in the second case. Lesser are these values, better is the performance of our model. Once we plot confusion matrix, our immediate next job would be to see if there is a way to reduce False Positive and False Negative values, which are also called as Type1 and Type 2 errors.

If we consider a classification task of cancer patients.

Type 1 error: predicting a cancer patient as a non cancerous.

Type 2 error: predicting a non cancer patient as cancerous.

In this case, intuitively it feels that Type 1 error as dangerous as a patient with cancer is declared as non cancerous which is a life risk. On the other hand, Type 2 error is not that harmful as that non cancer patient can undergo further diagnosis and have chances to be declared as non cancerous through manual examination. Hence our agenda would be to reduce Type 1 error while working on cases related to diagnosis.

So far, we have discussed about confusion matrix, now let us jump on to ROC curve. The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. Letting us know the ideal threshold value in classification problems is its core advantage.

Let us consider logistic regression, we know that in logistic regression, we make predictions based on the threshold value chosen. Threshold value can be anything between 0 and 1. So for each threshold value chosen, we have to plot those many confusion matrices and pick the better threshold value. This process looks cumbersome.

Thus, instead of being overwhelmed with confusion matrices, ROC plot provides a simple way to visualize all of the information at multiple thresholds.

In ROC plot, we choose sensitivity(TPR) as Y-axis and False Positive rate(FPR) on X-axis. At each threshold, we would need to calculate the values of both TPR and FPR by plugging the values of TP,FP,TN,FN

False Positive rate

Since we know that the FPR and TPR values lies between 0 and 1, in ROC plot, we would have 1 unit of X-axis and 1 unit of Y-axis. Any point on this green line, indicates that the proportion of incorrectly classified is as same as the proportion of correctly classified, which means that our model has only fifty percent chances of making correct predictions.

As discussed above, our agenda should be to reduce Type1 and Type2 errors, which means that we have to pick lesser FPR value, compared with TPR value. once we plot, ROC graph at multiple thresholds say T1,T2,T3,T4,T5,T6 the plot looks like below.

Here we can see, for thresholdT1, we have larger TPR value[close to 1], compared with FPR. But for T2, TPR is same as that of T1, however, FPR of T2 is lesser than FPR of T1. So, if we have to choose a threshold between T1 and T2, we would pick T2.

Similarly, for threshold T3, we see that the FPR reduced drastically, in comparison with FPR of T2, but for threshold T3, TPR value also got reduced. Likewise, at T4 , we have FPR as Zero which means False Positive values are absolutely Zero, however False Negatives might have been there, because of which TPR value is lesser than 1.

So based on the problem statement, we have to decide how many False Positives that a solution can accept, based on it we need to choose threshold value.

Now that we know about ROC, let us understand AUC(Area under curve). AUC helps us compare one ROC curve with the other.

Let us say, we have trained our model using two different algorithms, say Logistic Regression, indicated by Red ROC and on another algorithm say Support Vector Machine(SVM) indicated by blue ROC.

If we have to decide which algorithm performs well, we have to choose ROC whose AUC is greater. Here in the above case, Area under Red ROC is greater than the Blue ROC. Hence, we can say Logistic Regression suits well for that particular case study.

That’s the end, hope this article helped you.

Source: https://statquest.org/roc-and-auc-in-r/ [Josh Starmer]

--

--