Accuracy Vs AUC-ROC

Kaveti Naveenkumar
Nerd For Tech
Published in
2 min readJun 25, 2021
Photo by NASA on Unsplash

In this post I will talk about accuracy and area under ROC curve. Both of these metrics are useful to validate a classification model using historical data for which the target variable is known.

Accuracy:

Accuracy is the simplest validation metric to compute and understand, it is the proportion of correct classifications. Incase of uniformly distributed labels (~50% positive and ~50% negative) then accuracy can be useful to validate the model but incase of extremely imbalanced classes like, 98% negatives and 2% positives then it may lead us to wrong conclusions.

Confusion matrix of a binary classification model:

source: Joydwip’s blog

Accuracy: (TP + TN)/(TP+TN+FP+FN)

Two major reasons why accuracy is not useful always:

  1. It is threshold variant, highly depends on the chosen threshold value
  2. It is scale variant, multiplying probabilities with a scalar impacts accuracy score

Area under ROC:

Area under ROC curve is very useful metric to validate classification model because it is threshold and scale invariant. ROC plots FPR against TPR at different threshold values.

TPR (True Positive Rate): TP/(TP+FN)
FPR (False Positive Rate): FP/(FP+TN)

source: Machine Learning Mastery

ROC plots FPR in the X-axis and TPR in the Y-axis and each point in the plot corresponds to a threshold value.

  • At threshold 0, model predicts negative class for all data points and hence FPR and TPR both are zero
  • At threshold 1, model predicts positive class for all data points and hence FPR and TPR both are one

Orange curve in the above plot is the ROC curve and Area under this curve can be used to validate the classification model.

  • AUC-ROC is invariant to threshold value, because we are not selecting threshold value to compute this metric
  • AUC-ROC is invariant to scale, because multiplying the probability scores with a scalar value does not impact this metric (you can check this by yourself)

References:

  1. https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826
  2. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/
  3. https://towardsdatascience.com/an-understandable-guide-to-roc-curves-and-auc-and-why-and-when-to-use-them-92020bc4c5c1

--

--