Evaluation Metrics Part 1

For Classification Models!

Published in

The Owl

4 min readJun 20, 2020

When building a statistical or machine learning model, it is very important to evaluate the model. Evaluation metrics are used to evaluate the quality of the machine learning model and comprise an essential part of any project. To make improvements to our model, to achieve desirable performance, it is necessary to use such metrics to get feedback, to have an idea about the robustness or the generlization capability of the model.

Let us first look into some metrics which are often used to express other metrics easily

True Positive (TP)

Number of samples which are predicted positive and are also labeled as positive.

True Negative (TN)

Number of samples which are predicted negative and are also labeled as negative.

False Positive (FP)

Type-I error

Number of samples which are predicted positive but are actually labeled negative.

False Negative (FN)

Type-II error

Number of samples which are predicted negative but are actually labeled positive.

Confusion Matrix

Confusion matrix is a table used to describe the performance of a classification model. Confusion matrix is composed of four parts, TP,TN,FP,FN (which have already been discussed). It is very useful for measuring these evaluation metrics.

Accuracy

Accuracy is the proportion of the samples predicted correctly. It is one of the most common metric used. Accuracy can be expressed in terms of above four metrics as:

Accuracy

However, this metric doesn’t prove to be very useful in case of imbalanced dataset. Suppose, in a dataset, there are psotive and negative samples in the ration of 83 : 17. And the model you build, predicts positive for all samples, then for your model, TP, TN, FP, FN values are 83, 0, 17 and 0 respectively. The accuracy of the model is then calculated to be 83%. However, this model you built does not predict any of the negative samples correctly, and this is not at all desirable.

Two metrics which can resolve this issue we can face when using accuracy, by providing with a different outlook towards the problem at hand, are Precision and Recall.

So, what is this different outlook?

Precision

Positive Predictive Value

Precision is the proportion of the samples which are truly positive out of all the positively predicted samples. It actually gives us the fraction of samples which were correctly predicted, out of all the positively predicted samples.

Precision

Recall

Probability of Detection

Recall is the proportion of the samples which are correctly predicted positive out of all the actually positively labeled samples. That is, it gives us the fraction of the positively labeled samples which have been correclty predicted by the model.

Recall

Sensitivity

True Positive Rate, Power

Sensitivity is same as Recall.

Sensitivity

Specificity

True Negative Rate

Specificity is the proportion of negatively labeled samples which are predicted correctly, that is, predicted negative.

specificity

F1 Score

F1 score is the weighted Harmonic Mean of Precision and Recall. In case of imbalanced dataset, F1 score is a good evaluation metric to be used, as it also takes into account FP and FN, along with TP and TN.

General formula for F-score is

F1-score or balanced F-score is expressed as

Precision-Recall Curve

A precision-recall curve shows the relation between precision and recall for every possible threshold value. Recall is plotted along the x-axis and Precision is plotted along the y-axis. One important note about PR curve is that TN is not used in making the PR curve.

Check out Part 2, 3 and 4 for more on Evaluation Metrics and how to measure uncertainty related to evaluation metrics.

Part 2 :

Evaluation Metrics Part 2

Explained and Implemented!!

medium.com

Part 3 :

Evaluation Metrics Part 3

ROC Curve and AUC score Explained and Implemented!!!

medium.com

Part 4 :

Uncertainty in Evaluation Metrics

Explained and Implemented!!!!

medium.com

Evaluation Metrics Part 1

For Classification Models!

True Positive (TP)

True Negative (TN)

False Positive (FP)

False Negative (FN)

Confusion Matrix

Accuracy

Precision

Recall

Sensitivity

Specificity

F1 Score

Precision-Recall Curve

Evaluation Metrics Part 2

Explained and Implemented!!

Evaluation Metrics Part 3

ROC Curve and AUC score Explained and Implemented!!!

Uncertainty in Evaluation Metrics

Explained and Implemented!!!!

Written by Siladittya Manna