How to improve “Log-Loss” score. Kaggle trick.

Egor Vorobiev
3 min readMar 11, 2019

--

The Kaggle Leaderboard is a special kind of place. It’s not a real-life situation, where we would normally care about for example speed or the size of our models, we care only about our score. If you have the right models, the trick in this article will help you improve your score a little bit more (hopefully).

Let’s remind yourself what Log-Loss is.

Logarithmic loss (related to cross-entropy) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0. Log loss increases as the predicted probability diverge from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high log loss © Wikipedia

where yi is true label 0 or 1, and pi is your prediction probability in [0, 1] range. N is a number of examples.

How it looks like in your python code

it also can be done by sklearn

The use of the logarithm provides extreme punishments for being both confident and wrong. In the worst possible case, a prediction that something is true when it is actually false will add an infinite amount to your error score, it would be much better to keep our probabilities between 0.05–0.95 so that we are never very sure about our prediction. In this case, we won’t see the massive growth of an error function.

The Trick is:

We need to change all values that less than 0.025 to be equal to 0.025 and all values that are more than 0.975 to be equal to 0.975

Kaggle:

Let’s move our solutions to Kaggle env. In our submission file as an #id row, we have 2 NBA teams that will compete. The #target is a probability that the first team will beat the second. Our goal is to predict the outcome of the matches.

In the #target row we see the predictions of our model.

Let’s apply our “clip” code to change the predictions

As you can see we changed lakers_vs_bulls & timeberwolves_vs_buks results.

In practice this trick will give you a huge score improvement at the Kaggle leaderboard.

--

--