In the first blog, we discussed some important metrics used in regression, their pros and cons, and use cases. This part will focus on commonly used metrics in classification, why should we prefer some over others with context.
Let’s first understand the basic terminology used in classification problems before going through the pros and cons of each method. You can skip this section if you are already familiar with the terminology.
- Recall or Sensitivity or TPR (True Positive Rate): Number of items correctly identified as positive out of total true positives- TP/(TP+FN)
- Specificity or TNR (True Negative Rate): Number of items correctly identified as negative out of total negatives- TN/(TN+FP)
- Precision: Number of items correctly identified as positive out of total items identified as positive- TP/(TP+FP)
- False Positive Rate or Type I Error: Number of items wrongly identified as positive out of total true negatives- FP/(FP+TN)
- False Negative Rate or Type II Error: Number of items wrongly identified as negative out of total true positives- FN/(FN+TP)
- Confusion Matrix
- F1 Score: It is a harmonic mean of precision and recall given by-
F1 = 2*Precision*Recall/(Precision + Recall)
- Accuracy: Percentage of total items classified correctly- (TP+TN)/(N+P)