Performance Metrics- Classification

Published in

featurepreneur

3 min readOct 11, 2022

You had built your machine learning model, now what? It’s time to evaluate your model using the performance metrics.

However, there are so many different performance metrics that Data Scientists can use (Accuracy, Precision, Recall, etc.) that it is often overwhelming to choose which one to use.

Selecting the right metric for a specific model, however, is key to being able to measure the performance of the model objectively and in the right setting.

True Positive, True Negative, False Positive and False Negative

Each prediction from the model can be one of four types with regard to performance:

True Positive, True Negative, False Positive or False Negative.

True Positive (TP): A sample is predicted to be positive (e.g. the person is predicted to develop the disease) and its label is actually positive ( e.g. the person will actually develop the disease).
True Negative (TN): A sample is predicted to be negative ( e.g. the person is predicted to not develop the disease) and its label is actually negative ( e.g. the person will actually not develop the disease).
False Positive (FP): A sample is predicted to be positive ( e.g. the person is predicted to develop the disease) and its label is actually negative ( e.g. the person will actually not develop the disease). In this case, the sample is “falsely” predicted as positive.
False Negative (FN): A sample is predicted to be negative (e.g. the person is predicted to not develop the disease) and its label is actually positive ( e.g. the person will actually develop the disease). In this case, the sample is “falsely” predicted as negative.

Confusion Matrix

True Positive, True Negative, False Positive and False Negative is usually presented in a tabular format in the Confusion Matrix, which is simply a table organizing the four values.

Performance Metrics

Accuracy

Accuracy is the fraction of predictions our model got right out of all the predictions.

Accuracy ranges between 0 and 1.

Accuracy, however, is not a great metric, especially when the data is imbalanced.

Precision

To overcome the limitations of Accuracy, Data Scientists usually use Precision, Recall and Specificity. Precision tells what proportion of positive predictions was actually correct.

Recall or Sensitivity

Similarly to Precision, Recall aims at measuring what proportion of actual positives was identified correctly.

Specificity

Specificity aims at measuring what proportion of actual negatives was identified correctly.

F1 Score

The F1 score is a less-known performance metric, indicating the harmonic mean of Precision and Recall. The highest value of an F1 Score is 1, indicating perfect Precision and Recall, and the lowest possible value is 0 if either the Precision or the Recall is zero.