Evaluation Metrics — Classification Models

Published in

School of ML

4 min readJul 17, 2020

Model evaluation is a crucial step in the data science process. It is not simply enough to train a model on some data and assume that will perform well on all future data. We need to evaluate the model on unseen data. For the same purpose, we set aside part of the labeled data for model evaluation. We call it as test data.

Classification models are one family of Machine learning models. The different types of classification models are mentioned below.

Binary classification
Multi-class single-label classification
Multi-class multi-label classification

It is also important to decide what metrics should be used while evaluating the above models. Let’s assume we have trained a binary classification model to predict whether the picture is of a cat or dog.

Evaluation Matrics

Confusion Matrix: All the possibilities of input vs output are constructed in the form of a table as below. It is a most common evealuation matric.

The columns of the above table represent the actual class and rows represent predicted class. If the prediction is the same as actual then it is a correct classification. If not, then it is an incorrect classification. Every single prediction by the model will fall under one of the 4 cells. If the diagonal values are high relative to the non-diagonal values then the model is doing correct classification.

If the diagonal values are low relative to the non-diagonal values then the model is doing incorrect classification.

The other form of representing the confusion matrix is as follows.

True positives are the actual positive cases and predicted as positive.
False positives are the actual negative cases and predicted as positive.
True negatives are the actual negative cases and predicted as negative.
False negatives are the actual positive cases and predicted as negative.

Accuracy: It is the measure of all the correct predictions over all the predictions made by the model.

Formula for Accuracy

Accuracy = Sum of diagonal elements of the confusion matrix / Sum of all the elements of the confusion matrix.

Precision: It is the measure of correctness in the predicted positive cases.

Formula for Precision

Precision = True positives / Sum of True positives and False positives

Recall: It is the measure of positive cases that are correctly identified as positive.

Recall = True positives / Sum of True positives and False negatives

F1-score: It is a measure of the balance between precision and recall. Also known as F score.

The value of the F1 score ranges from 0 (worst) to 1 (best). This metric can be used for a model to be good at both precision and recall.

Receiver Operating Characteristics (ROC) curve: A graph that measures rate of True positives against the rate of false positives. It shows the performance of the model at all classification thresholds.
Area under the curve (AUC): It is the ability of the model to classify the positives and negatives. The more the area, the better is the model’s ability.

References:

https://www.udacity.com/scholarships/machine-learning-scholarship-microsoft-azure

Evaluation Metrics — Classification Models

Evaluation Matrics

Written by Rajasekhar Battula