When 99.9% Accuracy Can be Bad for your ML Model

And what you should do instead

Nachiketa Hebbar
Analytics Vidhya
4 min readJun 26, 2021

--

In this short blog, I am going to break down what the accuracy paradox is all about. Higher accuracy models are not always desirable, especially in classification models. To present my case, let me give you an example of a dummy classification problem.

Cancer Prediction based on Tumor Size

In the above figure we are trying to predict if a particular tumor is cancerous or not based on its size. So obviously a larger tumor size will likely to be cancerous and will be classified as Yes or 1.

The Problem Posed by Imbalanced Data

Now, do you notice any problem in the above data? Yes, it is highly imbalanced. While we have 5 examples for non cancerous tumor, there is only 1 example of a cancerous tumor. Now you should be able to identify the major problem that any machine learning model will fall prey to here.

Machine learning models operate on simple mathematics. They live just to minimize loss and increase accuracy. Here is what accuracy means in classification problems: (Number of correctly classified cases)/(Total Cases).

So let’s say i trained a logistic regression model on the above data, that gave me 66.6% accuracy. Meaning around 4 out of 6 example were correctly classified. Now lets make this more interesting.

Imagine, I bring in my 5 year old nephew to make predictions as well. And he simply predicts every tumor cell as non cancerous without even really looking at the tumor size. But guess what? Because of the imbalanced data, he is going to be right 5 out of 6 times! He outperformed the machine learning model!

Introducing Alternative Metrics

And that is why some other metrics were introduced which are not affected much by class imbalance. Namely True Positive, False Positive, True Negative and False Negative. If you don’t know what these four terms mean, I have illustrated that in the following figure:

And these 4 attributes are primarily used to calculate the Precision, Recall and F1-score. And you will be able to see how using these metrics could have prevented us from relying on my nephew’s predictions on the dummy classification problem.

Precision ,Recall and F1-Score

Precision, in simple terms is the proportion of correctly classified positive samples out of all the samples that were predicted to be positive. In mathematical sense:

Precision Formula

So if we take my nephew’s prediction and consider for now the cancerous cells, we will get the precision as zero. That is because even if there was only 1 positive example, it wasn’t classified correctly. Or we can say that the numerator in the formula, True Positives is zero.

Recall, in simple terms is the proportion of correctly classified positive samples out of all the samples that were actually positive. In mathematical sense:

Recall formula

Again the recall is also going to be zero for the cancerous cells as the true positives is zero. Using the precision and recall, another metric called as the F1-score can also be calculated as:

F1-Score Formula

And like precision and recall, the F1-score is also going to be zero for the cancerous class of sample. It helped us expose how bad my nephew’s model was despite a good accuracy.

Conclusion

In conclusion, we were able to see in this blog, how accuracy will not always be a good metric in classification. Because let’s be real, in most real world problems you will always have an imbalanced data, it will never be a 50:50 distribution of classes. In such a scenario, it is always good to calculate the f1-score of all classes and take an average of that.

If you liked this tutorial, do give this a clap and head over to my YouTube channel for more such Machine Learning related insights!

--

--