Accuracy Metrics in Binary Classification

Published in

Analytics Vidhya

3 min readApr 24, 2020

Binary classifications are the problems when we try to predict/ assess if a particular observation/ object is true to its real nature.

The classic examples for binary classification will be;

medical diagnosis
prediction of employee attrition
or any other judgment that we try to pass on to someone or something based on their behavior, which falls either of two states, true or false :)

So, since we are trying to predict if some observation is true or false, our prediction can fall into either of the 4 categories below.

Above is called Confusion Matrix, which helps easily understand where does each ‘prediction’ falls, in the context of the ‘actual’ true or false state.

Each of the cells in the confusion matrix denotes an important accuracy metric of our prediction, which accuracy metric we need to focus more depends on the application.

Accuracy

This metric represents how accurately we have predicted the result; true observations as true and false observations as false.

Accuracy = (true positives + true negatives)/total observations

while accuracy is quite intuitive and a generally used measure of prediction performance, it will not be a sufficient parameter when our observation sample is highly biased.

e.g: Transaction fraud prediction :

Fraudulent transactions, as a count represent < 2% of total transactions in general circumstances.

We develop a model to predict fraudulent transactions, and we get a confusion matrix like below, for 100 samples that are tested.

while the model accuracy remains at 98% [(1+97)/100], we clearly can see that out of the actual 2 fraudulent transactions, the model misses one.

this is when we try to refine our focus towards how many are predicted accurately based on our focus.

True positive rate (or Model Recall) — higher the better

True positive rate is how accurate is the model when capturing the ‘True’ results.

If our focus is primarily on capturing ‘True’ results, the true positive rate/ model recall is a great indicator.

However, a common phenomenon is that when we try to increase model recall, the false positive rate also increases.

False-positive rate (or false alarms) — lower the better

If the false alarms are costly, this is a metric we need to focus on.

Taking the same example of fraudulent transactions, if we are trying to use our predictions to determine and block fraudulent transactions, and if our model has a high false-positive rate; it means that we are blocking a high number of genuine transactions too.

It is going to cause damage to the user experience.

False positives are generally the costliest outcomes.

Healthy person being diagnosed to be affected with cancer and going through chemotherapy
Innocent given the verdict as guilty
Giving that extravagant churn prevention offer to a customer who is anyway going to stay with you.

False-negative rate (misses) — lower the better

This evaluates how many did we miss.

If we don’t want to miss a single fraudulent transaction
If we don’t want a single spam e-mail on our inbox
If we don’t want a single COVID 19 positive person to go undetected

we need to try and make this metric closer to zero.

This can be done by improving model recall (a good model) or costing us more false alarms (a bad model).

Conclusion

Which metric we need to focus on when evaluating the model performance heavily lies on the business application.

While accuracy is a good measure, it mostly looks like ‘True negatives’ is a parameter that no one really cares about; and accuracy can be mostly made of ‘True negatives’.

Accuracy Metrics in Binary Classification

Written by Bemali Wickramanayake