Measuring just Accuracy is not enough in machine learning, A better technique is required..

Aakash Bindal

Published in

Techspace

4 min readMar 24, 2019

Why Accuracy is not enough ?

Let’s consider this situation,

This is a confusion matrix for cancer test

If we calculate the Accuracy here, we get 90% accuracy using the formula :

Here,

TP is True Positive : This means that predicted class and actual class are same and positive(1 or True). If prediction class is cancer positive then actual class will also be cancer positive.

TN is True Negative : This means that predicted class and actual class are same but negative(0 or False). If prediction class is cancer negative then actual class will also be cancer negative.

FP is False Positive : This means that predicted class is positive(1 or True) but actual class is negative(0 or False). If prediction class is cancer positive then actual class will be cancer negative.

FN is False Negative : This means that predicted class is negative(0 or False) but actual class is positive(1 or True). If prediction class is cancer negative then actual class will be cancer positive.

Insofar, accuracy seems like to be a good technique. But wait a minute!!..

None of the prediction correctly classifies cancer positive moreover it didn’t predicted any case to be cancer positive.

The problem with this is that it didn’t learn anything from training because no matter what test case we are gonna throw to it, it will always predict cancer negative. But then you may ask,

DDI Editor's Pick: 5 Machine Learning Books That Turn You from Novice to Expert - Data Driven…

The booming growth in the Machine Learning industry has brought renewed interest in people about Artificial…

www.datadriveninvestor.com

Why the accuracy isn’t 50% ?

This is because we have fewer cases in which cancer is positive as compared to the number of cases in which cancer is negative. Therefore, there is very little chance that our model will predict False Negative. And, that’s the reason it is behaving so good(with 90% accuracy). This classification is called Unbalanced Classification problem.

In Unbalanced Classification data of 1 class is in a great amount(like in this case cancer negative) and the other class has very little data(cancer positive).

Now, it’s easy now to understand why we need a better technique than accuracy for evaluating our machine learning model.

Imagine, that you made a model that always predicts cancer negative no matter what example you are gonna throw to it. This is a dangerous scenario because a patient with cancer will always gets prediction of not having a cancer with an accuracy of 90%.

So, what’s the solution to this problem?

Answer — F1 - Score.

But, before we begin to understand what F1-Score is first we need to understand recall and precision.

Recall

Recall is the ratio of True positives and sum of True positives and False Negatives. It tells that how much our model correctly classifies positives cases out of all actual positive cases. In our case, our model didn’t predict any cancer positive so recall would be 0.

Precision

Precision is the ratio of True Positives and sum of True Positives and False Negatives. It tells that how much the model correctly predicts the positive cases out of the cases which the model predicts positive.

Left circle represents Accuracy and Right circle represents Precision

In our case, Precision would be undetermined since the model doesn’t predict any case to be cancer positive.

Precision is very helpful when we have a large number of False Positives cases. Like in the case of spam detection, False positive will be the case when the email is not spam but the model predicts it to be spam.

Imagine, if you have a model which predicts an email spam when actually it was important. This would be disastrous.

So, now we have understood all the prerequisite. Now, we can dive into

F1 - Score

F1-Score is the harmonic mean of precision and recall. The speciality of F1-Score is that it takes both False Positives(due to precision) and False Negatives (due to recall) into account.

F1-Score is a balance between precision and recall especially when we are dealing with Unbalanced classification problems.

F1-Score always lie between precision and recall and therefore, gives us a better and stable evaluation for the model.