Uncertainty in Evaluation Metrics

For Classification Models!!!!

Published in

The Owl

2 min readJun 21, 2020

In the Part 1, 2, and 3 of the Evaluation Metrics series, we have discussed several important evaluation metrics (19 in total including AUC score).

Now, we need to figure out how to put all of these together in a single place and for problems involving more than 2 classes (binary classification).

For a multi-class classification set-up, the metrics are calculated for each class using One vs. Rest method.

In the example given below, we will create a dataframe, where prediction probabilities for each class will be examined to output evaluation metrics for every class in a multi-class classifiation problem.

Combine all the above in a single DataFrame

To get the evaluation metrics

Other evaluation metrics can also be included in the DataFrame like the ones shown here, but that needs to be done manually. Only a few are shown here. The rest can be added as and when necessary (appropriate functions shown in the previous parts need to be called).

Confidence Intervals

Confidence intervals are used to propose a plausible range of values for the unknown parameter of the population with a certain confidence level. If X be the observations and c be a confidence level, then a valid confidemce interval has c probability of containing the true value of the underlying parameter. A 95% confidence interval means that if we take 100 different samples and compute a 95% confidence interval for each sample, then approximately 95 of the 100 confidence intervals will contain the true mean value. The confidence interval does not refelect the variability in the unknown parameter, rather it reflects the amount of random error in the sample and provides a range of values, likely to contain the true value of the parameter.

How to calculate confidence intervals for some common metrics

Error bars

Error bars are graphical representations of the variability of data and used on graphs to indicate the error or uncertainty in a reported measurement. They give a general idea of how precise a measurement is, or conversely, how far from the reported value the true (error free) value might be. Error bars can be used to determine whether differences are statistically significant or how good a function fits a data.

How to plot error bars

Part 1 :

Evaluation Metrics Part 1

Explained and Implemented!

medium.com

Part 2 :

Evaluation Metrics Part 2

Explained and Implemented!!

medium.com

Part 3 :

Evaluation Metrics Part 3

ROC Curve and AUC score Explained and Implemented!!!

medium.com

Hit the clap button if you like the story or think that it will help others!!

Uncertainty in Evaluation Metrics

For Classification Models!!!!

Combine all the above in a single DataFrame

Confidence Intervals

Error bars

Evaluation Metrics Part 1

Explained and Implemented!

Evaluation Metrics Part 2

Explained and Implemented!!

Evaluation Metrics Part 3

ROC Curve and AUC score Explained and Implemented!!!

Written by Siladittya Manna