Why is Accuracy not a good measure for all classification problems in Machine Learning?

Published in

Alien Brains

4 min readApr 9, 2022

Hey Guys !!

So most of you have solved problems based on classification tasks right? When the model is built what metric do you use often. ACCURACY isn’t it? Well we often say the accuracy is 99% so the model must be doing great. However have you ever come across this situation that even after attaining a very high accuracy, your model fails to perform well while giving predictions later on?

I have faced this problem multiple times and guess what can be the issue? Imbalanced Data !!

What is imbalanced data?

Let’s take an example of a dataset : Credit Card Fraud Detection.

Shape of the Dataset : Credit Card Fraud Detection

If you observe we have 2,84,807 datapoints but out of that only 492 are fraud cases whereas 2,84,315 are genuine cases.

This type of datasets are called Imbalanced Datasets where majority of the datapoints belong to one class.

Why Accuracy isn’t a good measure for Imbalanced Datasets?

First of all what is ACCURACY ? It basically tells us the number of correct predictions out of all the predictions.

Now coming back to the original question, let’s say your model fails to classify any of the fraud detections. Still it will have an accuracy of 99.83% (284315/284807). Crazy right? You will think your model is doing extremely well whereas in reality it will perform awful when you test it out. So accuracy is definitely not a good measure in this situation. How to solve it?

Well we do have some other metrics which we can use : Precision, Recall, F1 scores.

But before we move on we need to know about Confusion Matrix.

What is a Confusion Matrix?

A confusion matrix is a table that summarizes the performance of a classification algorithm. It looks something like this :

It can be used for both binary and multiclass classification. But since we are dealing with a binary problem let’s stick to the example as of now.

Since here we are doing fraud prediction :

Fraud : Positive class, Genuine : Negative class

There are few terms we need to know :

True Positive : Transactions predicted as fraud and are fraud in reality (In this example TP= 65)

True Negative : Transactions predicted as genuine and are genuine in reality (In this example TN= 56826)

False Positive : Transactions predicted as fraud but are genuine in reality (In this example FP= 32)

False Negative : Transactions predicted as genuine but are fraud in reality (In this example FN= 37)

Accuracy:

It is basically number of correct predictions out of all the cases. So in this case :

Accuracy = (56828+65)/(56828+65+37+32) = 0.9987 i.e. 99.87%

However if you observe there are clearly multiple cases where genuine has been classified as fraud and vice versa. Hence just moving ahead with this model as it has a high accuracy is not a great move. Let’s have a look at the other metrics which give information about those wrong predictions too.

Precision :

Out of all the predictions that are positive (i.e. fraud) how many are actually positive.

Precision = 65/(65+32) = 0.67 i.e. 67%

Recall :

Out of all the cases that are positive (i.e. fraud) in reality how many have been classified correctly.

Recall = 65/(65+37) = 0.6372 i.e. 63.72%

Now we would definitely want to reduce the False Positive and False Negative cases as much as possible thus increasing the Precision and Recall. We have another metric that combines these two together :

F1 Score :

Combines Precision and Recall together as the Harmonic Mean.

F1 Score = (2*0.6372*0.67)/(0.6372+0.67) = 0.6532 i.e. 65.32%

Thus for such imbalanced datasets Accuracy isn’t a good measure. So instead of just relying on Accuracy we should also check the Precision, Recall and F1 Score and should go ahead with a model when all theses values are high.

Now if the dataset is imbalanced is there any way to solve that. Well yes. We will talk about it in the next blog. So stay tuned!!

Feel free to comment down below if you have any doubts..

All the best and Happy Learning..!!

Why is Accuracy not a good measure for all classification problems in Machine Learning?

Hey Guys !!

Written by Aoishi Das