Sitemap

Understanding of the accuracy evaluation for a Classification Model

5 min readOct 22, 2024

--

Before evaluating a classification model, we first need to define some metrics.

Classification Metrics

Accuracy

It is a fundamental metric for model evaluation, representing the percentage of correct predictions out of the total number of predictions

Accuracy

Using accuracy may be the best choice for balanced datasets because it effectively reflects the performance across all classes in such cases. But it is not a good choice with unbalanced classes.

In conclusion, accuracy demonstrates how much we can trust the model’s predictions.

Logarithmic Loss

The Logarithmic Loss that is also referred to as log loss or binary cross-entropy is specifically applied in evaluating the prediction of probabilistic classifiers in that, the output is in form of a probability.

How Logarithmic Loss Works for Binary Classifications

For binary classification problems, classifiers output a probability of an instance belong to a class of two ( e.g., if the probability of class 1 in binary is p, then the probability for class 2 is 1- p ). Log loss evaluates how good the probability matches with the actual class label.

Source

The top example depicts a poor prediction, where there is a large difference between the predicted and actual, this results in a large LogLoss. This is bad because the function is penalizing a wrong answer that the model is “confident” about.

Conversely, the bottom example shows a good prediction that is close to the actual probability. This results in a low LogLoss, which is good because the model is rewarding a correct answer that the model is “confident” about.

Log Loss

The goal is to reduce LogLoss when using it as a measure of model performance. A model that perfectly predicts probabilities would have a LogLoss of 0.

Confusion Matrix

A confusion matrix is a table that demonstrates the correct and incorrect predictions made by the model by comparing its predicted labels to the true labels.

It is created after making predictions on test data and it is a size of n*n matrix where n is the number of classes.

Example of confusion matrix: email spam classification model.

It is a binary classification problem. The two possible classes are “spam” and “not spam.”

After training the model, we generated predictions for 10000 emails in the validation dataset. We already know the actual labels and can evaluate the quality of the model predictions.

Here is how the resulting matrix can look:

Source : link

True Positive (TP)

  • This is the top left (green) corner of the matrix.
  • It shows the number of correctly identified positive cases. These are the cases where the actual label is positive, and the model correctly predicted it as positive.
  • In spam detection, this is the number of correctly predicted spam emails.
  • In the example, the number of true positives is 600.

True Negative (TN)

  • This is the bottom right (green) corner of the matrix.
  • It shows the number of correctly identified negative cases. These are the cases where the actual label is negative, and the model correctly predicted it as negative.
  • In spam detection, this is the number of correctly predicted non-spam emails.
  • In the example, the number of true negatives is 9000.

False Positive (FP):

  • This is the bottom left (pink) corner of the matrix.
  • It shows the number of incorrectly predicted positive cases. These are the cases where the actual label is negative, but the model predicted it as positive.
  • To put it simply, these are false alarms. They are also known as Type 1 errors.
  • In spam detection, this is the number of emails incorrectly labeled as spam. Think of regular emails sent to the spam folder in error.
  • In the example, the number of false positives is 100.

False Negative (FN):

  • This is the top right (pink) corner of the matrix.
  • It shows the number of incorrectly predicted negative cases. In other words, these are the cases where the actual label is positive, but the model predicted it as negative.
  • To put it simply, these are missed cases. They are also known as Type 2 errors.
  • In spam detection, this is the number of missed spam emails that made their way into the primary inbox.
  • In the example, the number of false negatives is 300.
Source : link

Confusion Matrix Metrics

Accuracy

Accuracy is the share of correctly classified objects in the total number of objects. In other words, it shows how often the model is right overall.

Source : link

Precision

Precision is the share of true positive predictions in all positive predictions. In other words, it shows how often the model is right when it predicts the target class.

Source : link

Recall

Recall, or true positive rate (TPR). Recall shows the share of true positive predictions made by the model out of all positive samples in the dataset. In other words, the recall shows how many instances of the target class the model can find.

Source : link

F1 Score

The F1 score is the weighted average of precision and recall, and the classifier will only achieve a high F1 score if both precision and recall are high. This metric favors classifiers that have similar precision and recall. The higher F1 Score indicates better performance.

F1 score

Thanks for reading

--

--

No responses yet