When 99.9% Accuracy Can be Bad for your ML Model

And what you should do instead

Published in

Analytics Vidhya

4 min readJun 26, 2021

In this short blog, I am going to break down what the accuracy paradox is all about. Higher accuracy models are not always desirable, especially in classification models. To present my case, let me give you an example of a dummy classification problem.

In the above figure we are trying to predict if a particular tumor is cancerous or not based on its size. So obviously a larger tumor size will likely to be cancerous and will be classified as Yes or 1.

The Problem Posed by Imbalanced Data

Now, do you notice any problem in the above data? Yes, it is highly imbalanced. While we have 5 examples for non cancerous tumor, there is only 1 example of a cancerous tumor. Now you should be able to identify the major problem that any machine learning model will fall prey to here.

Machine learning models operate on simple mathematics. They live just to minimize loss and increase accuracy. Here is what accuracy means in classification problems: (Number of correctly classified cases)/(Total Cases).

So let’s say i trained a logistic regression model on the above data, that gave me 66.6% accuracy. Meaning around 4 out of 6 example were correctly classified. Now lets make this more interesting.

Imagine, I bring in my 5 year old nephew to make predictions as well. And he simply predicts every tumor cell as non cancerous without even really looking at the tumor size. But guess what? Because of the imbalanced data, he is going to be right 5 out of 6 times! He outperformed the machine learning model!

Introducing Alternative Metrics

And that is why some other metrics were introduced which are not affected much by class imbalance. Namely True Positive, False Positive, True Negative and False Negative. If you don’t know what these four terms mean, I have illustrated that in the following figure:

And these 4 attributes are primarily used to calculate the Precision, Recall and F1-score. And you will be able to see how using these metrics could have prevented us from relying on my nephew’s predictions on the dummy classification problem.

Precision ,Recall and F1-Score

Precision, in simple terms is the proportion of correctly classified positive samples out of all the samples that were predicted to be positive. In mathematical sense:

So if we take my nephew’s prediction and consider for now the cancerous cells, we will get the precision as zero. That is because even if there was only 1 positive example, it wasn’t classified correctly. Or we can say that the numerator in the formula, True Positives is zero.

Recall, in simple terms is the proportion of correctly classified positive samples out of all the samples that were actually positive. In mathematical sense:

Again the recall is also going to be zero for the cancerous cells as the true positives is zero. Using the precision and recall, another metric called as the F1-score can also be calculated as:

And like precision and recall, the F1-score is also going to be zero for the cancerous class of sample. It helped us expose how bad my nephew’s model was despite a good accuracy.

Conclusion

In conclusion, we were able to see in this blog, how accuracy will not always be a good metric in classification. Because let’s be real, in most real world problems you will always have an imbalanced data, it will never be a 50:50 distribution of classes. In such a scenario, it is always good to calculate the f1-score of all classes and take an average of that.

If you liked this tutorial, do give this a clap and head over to my YouTube channel for more such Machine Learning related insights!

When 99.9% Accuracy Can be Bad for your ML Model

And what you should do instead

The Problem Posed by Imbalanced Data

Introducing Alternative Metrics

Precision ,Recall and F1-Score

Conclusion

Nachiketa Hebbar

Hey, everyone. Welcome to my channel. I am a Computer Vision Engineer and also currently a fresher. I mostly talk and…

Written by Nachiketa Hebbar