Beyond Accuracy: Recall, Precision, F1-Score, ROC-AUC

8 min readNov 5, 2022

When talking about classification in Machine Learning, we tend to focus on the test accuracy i.e., how many instances were classified correctly among the total number of test instances. This could be misleading when it comes to imbalanced data. In this post, we will discuss other performance metrics like Recall, Precision, etc., and what additional advantages they offer in comparison to accuracy.

For ease of explanation, let us consider a simple multi-class classification problem with the iris dataset throughout this post. There are three types of flowers setosa, versicolor and virginica which are labelled here as 0, 1, and 2.

Drawbacks of Accuracy

Before discussing other metrics, let us understand the drawbacks of accuracy.

Let us consider a scenario where 90 samples are from classes versicolor and setosa and only 10 samples of virginica in our sample dataset. The model classifies samples of both the classes correctly and the accuracy is 90%. This might seem high but we miss the 10 misclassified samples of virginica.
As we are going to see in the subsequent sections, accuracy weighs all kinds of misclassifications equally where as some kinds of errors might be more harmful than the others depending on the situation.

Beyond Accuracy: Recall, Precision, F1-Score, ROC-AUC

Drawbacks of Accuracy

Written by Priyanka