Accuracy Matrices in Machine Learning

karan kumar
3 min readJan 19, 2019

--

The most simple way to calculate the accuracy of any classification machine learning model is to calculate the simple accuracy using number of correctly classified point in test data and total number of points in the test data.

accuracy of the model = (# points correctly classified)/(total points)

This accuracy is represented by the matrix called Confusion Matrix. We can understand it better with the following matrix.

Here in matrix TN (True Negative) means, total number of points those were of class 0 and correctly predicted by the model as class 0. FP (False Positive) means, total points that were of class 0 but predicted as 1 by my model. TP (True Positive) means, total points those were of class 1 and predicted correctly of class 1 by the model. FN (False Negative) total points those were of class 1 but predicted of class 0 by the model.

In case some of the critical machine learning models, where cost of wrong prediction is very high, we want FN to be as low as possible. We can understand with an example- suppose some person was having a cancer, but my model predict it as not cancer. It may be very dangerous for that person. So we can not fully rely over the accuracy given by confusion matrix.

We can plot the confusion matrix, given true label and predicted_label.

Python code to print the confusion matrix
Confusion matrix

So from the matrix we can calculate various other accuracy given blow

True Positive Rate = (TP)/(TP+FN)

True Negative Rate = (TN)/(TN+FP)

False Positive Rate = (FP)/(FP+TP)

False Negative Rate = (FN)/(FN+TN)

In the above formulas True Positive Rate is called Recall and False Positive Rate is called as Precision. Precision and Recall are used in information retrieval, when we have to retrieve some useful information from huge data. Both Precision and Recall only care about the Positive class.

NOTE — If the given data set is imbalance, confusion matrix does not work very well

In machine learning there is another matrix that is used to calculate the accuracy of the model, called f1_score. It is the Harmonic Mean of Precision and Recall.

f1_score = (2*Precision*Recall)/(Precision+Recall)

Calculating the f1_score using python

log-loss function -> log-loss function is another method to calculate the error in the classification problems. It uses the probability score.

For any classification problem the log-loss function can be given as :

log-loss = avg (- log(probability of correct class label))

log-loss is penalize for small deviation in the probability score. Less value of log-loss represents that my model is sensible. Unlike the other method, it is very hard to interpret the log-loss function.

--

--