Common Loss functions in machine learning for a Regression model

Published in

Analytics Vidhya

6 min readAug 31, 2020

To find a correct Loss Function for any Algorithm is critical, because inaccurate selection of Loss Function will cause wrong solution and can become a trouble maker in optimization of machine learning model.

Machine learning is a pioneer subset of Artificial Intelligence, where Machines learn by itself using the available dataset. For the optimization of any machine learning model, an acceptable loss function must be selected. A Loss function characterizes how well the model performs over the training dataset. Loss functions express the discrepancy between the predictions of the model being trained and also the actual problem instances. If the deviation between predicted result and actual results is too much, then loss function would have a very high value. Gradually, with the help of some optimization function, loss function learns to reduce the error in prediction. In this article, we will go through several loss functions and their applications in the domain of machine/deep learning.

There is no universal loss function which is suitable for all machine learning model. Depending upon the type of problem statement and model, a suitable loss function needs to be selected from the set of available. Different parameters like type of machine learning algorithm, degrees of the percentage of outliers in the provided dataset, ease of calculating derivatives etc. play their role in choosing loss function.

The loss functions are mainly divided into two major categories of Regression losses and Classification losses. In this article, only Regression losses will be discussed and Classification losses will be published in another article. (Note: Regression function generally predicts a value/quantity, whereas classification functions predict a label/class)

Regression losses:

1.Sum of Errors (SE)

This one is a basic loss function which can be calculated just by adding all the error difference between the predicted value and actual value in each iteration. The mathematical representation is as follows:

Sum of Errors

where Ŷ represents predicted and Y represents actual value.

Sum of Errors(SE) is not an effective function to use, because the predicted result deviation from actual result could be in a positive direction or a negative direction. Because of this, the Sum of Errors (SE) value could be less than the actual total deviation from the desired actual results.

2.Sum of Absolute Errors (SAE)

The Sum of Absolute Error(SAE) is the total sum of the absolute value of all the error difference between the predicted value and actual value in each iteration. Below shows the mathematical representation:

Sum of Absolute Errors

where Ŷ represents predicted and Y represents actual value.

Here in Sum of Absolute Errors (SAE), the resulted error will show the total deviation, predicted results have from the desired actual results because the deviation in positive and in negative direction won’t have any impact on the value of each other, because of which their respective values will not get diminished.

3.Sum of Squared Errors (SSE)

The Sum of Squared Errors (SSE), is the summation of the squares of errors which is deviations predicted from actual desired values of data. It is a measure of the discrepancy between the data and an estimation model. A small SSE indicates a tight fit of the model to the data. The function looks as follows:

A small deviation in predicted result from the actual result will have squared impact on the error value. This function gives non-negative value and can be differentiated at all points.

4.Mean Absolute Error (MAE) / L1 Loss

Mean absolute error (MAE) / L1 Loss is measured as the average of the sum of absolute differences between predictions and actual results. This calculates the magnitude of the difference.

The direction/sign of errors has no impact on MAE calculation. No matter whether the difference is positive or negative. MAE is more robust to outliers but it needs more complicated tools for gradients computation.

5.Mean Squared Errors (MSE) / L2 Loss

Mean square error (MSE) / L2 Loss is calculated by taking an average of the sum of the squared difference between predicted and actual values. MSE gives the average magnitude of error irrespective of their direction.

MSE value gets highly modified when there is a small change indifference of prediction and actual value. Due to squaring, it penalizes heavily to the model as per the difference between predicted and actual value. It is easier to calculate gradients with MSE.

6.Root Mean Squared Error (RMSE)

The Root Mean Squared Error (RMSE) is a mostly used error function. It is the differences between values predicted by a model and the actual desired values. The RMSE can be calculated by taking the square root of above mentioned Mean Squared Errors (MSE) / L2 Loss. The effect of each error on RMSE is proportional to the size of the squared error. It represents the square root of the second sample moment of the deviation between predicted model values and actual values.

Larger errors have a disproportionately large effect on RMSE. These deviations are called errors. The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure. RMSE depict the model’s accuracy and help in comparing e forecasting errors of different models for a particular dataset.

RMSE has no effect on its value from the model prediction deviation sign, and it is always non-negative. Zero value of RMSE shows that the model is 100% accurate, but in practice, it never happens. For model optimization, it is in practice to have a lower value of RMSE as possible.

7.Mean Bias Error (MBE)

The Mean Bias Error is not frequently used for any machine model as a loss function. MBE mainly used to calculate the average bias in the machine learning model. Although it is less accurate in practice, it could determine if the model has a positive bias or negative bias.

It helps in optimizing the model. It helps in deciding if any steps need to be taken into consideration for the model bias. MBE’s output is the average bias in the prediction. MBE is the same as MSE with the only difference that it doesn’t take absolute values as taken in MSE. A positive value of MBE represents the value is overestimated and a negative value represents that the value is underestimated.

8.Huber Loss

In statistics, the Huber loss may be a loss function employed in robust regression, that’s less sensitive to outliers in data than the squared error loss. A variant for classification is additionally sometimes used. The Huber loss combines the simplest properties of MSE and MAE. it’s quadratic for smaller errors and is linear otherwise (and similarly for its gradient). it’s identified by its delta parameter. Mathematically it is defined as follows:

It can also be differentiated by 0. It is an absolute error, which becomes quadratic when an error is small. The hyper-parameter, 𝛿 (delta) decide, how small that error has to be to make it quadratic. 𝛿 (delta) can also be tuned. Huber loss approaches MAE when 𝛿 is approximately zero and MSE when 𝛿 is approximately infinity (large numbers.)

References

Note

Also if you are a beginner in machine learning and enthusiast to learn more, then you can search for GitHub account sushantkumar-estech or can use the link https://github.com/sushantkumar-estech for interesting projects

Select any project from your wish for practice and in case of any question, you can write to me. I would be happy to help.

Enjoy reading and happy learning!!

Common Loss functions in machine learning for a Regression model

Regression losses:

References

Note

Written by Sushant Kumar