Binary Classification

To be, or not to be: that is the question

Gaurav Chandak
Learning Machine Learning
4 min readAug 26, 2017

--

Originally published on my blog: Binary Classification

Introduction

Binary Classification as the name suggests is the task of classifying elements into one of two classes/groups. Some applications of binary classification are:

  • Testing if a person has a particular disease or not
  • Classifying email as spam or not spam
  • Credit card fraud detection, etc.

It is a form of supervised learning where

  • Given a set of observations
  • A model needs to be trained based on those observations
  • Post which the model should be able to classify new observations into one of the categories.

Methods

Some of the most commonly used methods for binary classification are:

None of these are better than the other and it totally depends on the problem/use case and the available data. Any two optimization algorithms are equivalent when their performance is averaged across all possible problems. There is no free lunch. Though it is recommended that we should start with something simple and make it more complicated if and only if necessary.

Evaluation

The simplest and most common evaluation metric for binary classification problem is accuracy.

Accuracy = (# Correct Predictions)/(# Observations)

Though it seems to be a very good metric for evaluation but it may not be desirable for every use case. Say, we are trying to detect if a person has cancer or not. Say, we try to classify 1000 people with having cancer or not and we are able to get 95% accuracy. Though it may seem like a very good model but here is the catch. If most of the samples are negative (no cancer) and the model predicts them as negative, the accuracy will be high even if some of the positive samples are predicted as negative. This is undesirable since we would not want to tell a person with cancer that he does not have cancer but we can ask a person with no cancer to undergo some tests, if required. So, we would want to prefer a model which tries not to predict positive cases as negative, i.e., has a higher recall.

Let’s say that the observations have both positive (P) and negative (N) samples. The model gives two types of predictions: predicted positive (P’) and predicted negative (N’). Based on the predictions, we can create a confusion matrix:

Confusion Matrix (Source:MathWorks)

It can be seen that:

  • P + N = # Observations
  • TP + TN = True (Correct) Predictions
  • FP + FN = False (Incorrect) Predictions
  • TP + FN = P
  • TN + FP = N
  • TP + FP = P’
  • TN + FN = N’

Therefore, Accuracy (ACC)= (TP + TN)/(P + N)

We can derive 8 more useful metrics based on TP, FP, TN, FN. These are:

Further Derivations:

Another commonly used evaluation metric is AUC (Area under the curve)/ROC (Receiver operating characteristic) curve score.

ROC Curve (Source:Wikipedia)

A ROC space is defined by FPR and TPR as x and y axes, respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs). The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. AUC is the area under the ROC Curve. A higher AUC score signifies a better model.

If you want to see a working approach for a binary classification problem, check this out.

If you want to jump to an algorithm, it’ll be good to start with k-nearest neighbors algorithm.

Stay tuned as I learn and share more of my learnings on Learning Machine Learning.

--

--