Log Loss Function: Machine Learning error measurement metric

Arjun Subedi
2 min readDec 4, 2018

--

Choosing how to evaluate machine learning model is one of the most important decision of the machine learning process. The decision should balance the real world application of the algorithm, the mathematical property of the evaluation function and interpretability of the measure.

Often we hear the question, how accurate is your model. Accuracy is a simple measure that tells us what percentage of the rows we got right However sometimes accuracy does not tell the whole story.

Consider a case of identifying SPAM in an email. There are 99% percentage of legitimate (ham) emails and only 1% spams. Here if I set the model to say all the emails are legitimate then my model is 99% accurate but it is useless.

Similar case applies to the cancer detection. So if the occurrence of Y and N is drastically out of proportion the best way to evaluate the accuracy is to use log loss metric.

Log loss is generally called as loss function and it is the measure of error. We want our error to be as small as possible. Here instead of maximizing accuracy, we are minimizing error.

‘#computing log loss function:import numpy as npdef compute_log_loss(predicted, actual, eps=1e-14)
predicted = np.clip(predicted, eps, 1-eps)

#clip sets maximum and minimum value in our array so that we take care of 0(log(0)) which is infinity.
loss =-1*np.mean(actual*np.log(predicted)+(1-actual))*np.log(1-predicted)
return loss’
Note:
'''predicted: the predicted probabilities as float between 0–1
actual: The actual binary label, either 0 or
eps: (optionL) log(0) is infinity so we need to offset our predicted values slightly by eps from 0 to 1

log loss (n=1) = y log(p) +(1-y)log(1-p)

Consider the case when the true label is zero by we with confidently predicted(0.9) the label to be 1.
In this case, because y=0, the first term in above equation becomes 0. This means log loss is calculated
by only second term i.e (1-y)log(1-p)

log(1–0.9)=>log 0.1=>2.3

Case 2: conseide the case when the correct label =1 but we are only 50% sure. then log loss =0.69

Thus for the better prediction we want to minimize log loss value.

From the above observation we can conclude that it is better to be less confident than be confident
and wrong. ‘’’

--

--