Log Loss Magic Numbers

Published in

When I Work Data

3 min readJan 31, 2019

TL;DR

There are specific log loss values for classification problems that indicate whether your model is doing better than chance.

Introduction to Log Loss

There are a number of metrics you can use when gauging the quality of a binary classifier. These include accuracy, precision/recall and sensitivity/specificity. The catch with all of these metrics is that they require the classifier to label each instance with either a positive or negative label. There’s no room for shades of grey.

In addition to predicting a label, binary classifiers such as logistic regression, tree-based models and neural networks can all give class probabilities. So instead of predicting an instance as positive, it can predict that the instance has a 60% chance of being positive and 40% chance of being negative.

While this frees us up to examine the predictions at a more granular level, all of the model validation metrics we were using are now invalid. [Sound of taps being played]. There’s a forgotten old friend that can help us navigate this more refined landscape: log loss.

The definition of log loss can be found many places around the web. Even after knowing the formal definition we still need to answer the question “How do I know if my model is any good?”. Answering this question with the accuracy metric is intuitive. For a sample balanced between the two classes it’s clear that random guessing will lead to 50% accuracy. Any accuracy higher than that must be due to the predictive power of your model.

Binary Classification

There are equivalent magic numbers for log loss. For binary log loss, the way to randomly guess is to set the probability of both classes to 0.5 for every instance. Here is some Python code to demonstrate:

0.69314718056

Notice that the dataset is balanced since there are an equal number of class 0 and class 1 in actual. So this is the equivalent of an accuracy of 50% in a binary classification problem. An accuracy score above 50% indicates that your model has some predictive power. Likewise, a log loss score below 0.693 indicates that your model has predictive power.

Three Class Classification

So that’s for a binary classification problem. What if the problem has three classes? The approach is similar. We can find out the log loss magic number by altering the binary problem code a bit.

1.09861228867

Here we see the magic number is 1.098. So for three class problems we must make sure that the log loss on our test set is below this threshold.

Arbitrary Number of Classes

So now we know the log loss magic numbers for a few specific cases. It would be nice to know what it is for the general case, though. Fortunately Firebug on Stack Overflow has provided a simple formula to calculate the log loss magic number for an arbitrary number of classes. It is simply the natural log of the number of classes.

log(2) = 0.693
log(3) = 1.098
log(4) = 1.386
log(5) = 1.609

Conclusion

Log loss magic numbers specify the performance threshold that a model must beat. Armed with the appropriate magic number for our specific classification problem it is easy to see if the probability predictions that our model is making have any predictive power.

References

https://www.quora.com/If-I-have-a-classification-algorithm-binary-with-a-logloss-of-0-69-does-this-mean-my-algorithm-is-no-better-than-a-random-guess

Scaling up random guess benchmark of log loss

Thanks for contributing an answer to Cross Validated! Please be sure to answer the question. Provide details and share…

stats.stackexchange.com