Evaluation Metrics — Machine Learning

Nilay Chauhan
Data Stash
Published in
4 min readDec 3, 2020

After we develop a model, we need to find out — how good our model is performing?

There are a few matrices which can help us answer the following question.

Evaluation Metrics for Classification Models

Confusion Matrix

Confusion Matrix is also known as an error matrix. It is the table which visualises the performance of supervised learning algorithm. Each row of the matrix represents the instance in an actual class.

from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)

Accuracy

It is the number of correctly predicted data points from all the data points. It can be also defined as, the sum of true positives and true negatives divided by the number of total data points.

from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)

Precision

Precision can be defined as the number of true positives divided by the number of false positives plus true positives. Precision is the fraction of +ve examples among the examples that models classified as +ve.

Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. — Wikipedia

Precision tries to answer the following question:

What proportion of positive identifications was actually correct?

from sklearn.metrics import precision_score
precision_score(y_true, y_pred)

Recall

Recall can be defined as the number of true positives divided by the number of true positives plus false negatives. The recall is also known as sensitivity.

Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved — Wikipedia

Recall tries to answer the following question:

What proportion of actual positives was identified correctly?

from sklearn.metrics import recall_score
recall_score(y_true, y_pred)

F-Score

F-Score which is also known as F1-Score is a matrix to measure a model’s accuracy on a given data. It combines the precision and recall of the model and returns a single value known as the harmonic mean of the model’s precision and recall.

Example: If the precision of the model is 45% and the recall is 80%, what is the F1-Score?

from sklearn.metrics import f1_score
f1_score(y_true, y_pred)

Fβ-Score

Fβ-Score allows us to adjust the weight of precision or recall more highly, according to our use case.

If β=0, we get precision.

If β=∞, we get recall.

ifβ=1, we get the F1-Score

For other values of β, if it is close to 0, we get something close to precision, if they are large numbers, then we get something close to recall.

ROC Curve and AUC

Receiver Operating Characteristic curve(ROC) is a graphical representation of the performance of a classification model at different classification thresholds. It plots two parameters.

  • True Positive Rate(TPR)
  • False Positive Rate(FPR)

ROC Curve plots TPR vs. FPR at different classification thresholds.

AUC — Area Under the ROC Curve measures the entire area underneath the entire ROC curve from 0 to 1.

from sklearn import metrics
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
metrics.auc(fpr, tpr)

Evaluation Metrics for Classification Models

There are a few metrics to evaluate Regression Models:

  • Mean Squared Error
  • R2 Score

Mean Squared Error

Mean Squared Error(MSE) is the average of the square error that is used as the loss function for the least square algorithm. It is the square of the difference between the estimated values and the predicted values.

from sklearn.metrics import mean_squared_error
mean_squared_error(y_true, y_pred)

R2 Score

The R² score is a statistical measure that represents the fitness of a regression model. R² score is also known as the coefficient of determination. The ideal value for the R² score is 1. The closer the value of r-square to 1, the better is the model fitted.

from sklearn.metrics import r2_score
r2_score(y_true, y_pred)

--

--