MSE or MAE? Which and Why? Loss Functions used in Regression and Classification.

2 min readSep 29, 2021

In Neural Networks (NNs), the loss function is critical to understanding the model’s performance. The loss function must be chosen carefully while constructing and configuring NN models. And the option chosen is determined by the task at hand, such as regression or classification. In this article, I’ve attempted to describe the loss function in layman’s words, along with use cases.

Regression

Three popular loss functions that are commonly used for regression tasks:

MSE is the abbreviation for Mean Squared Error. The L2 loss function is another name for it. For regression tasks, it is a popular choice of loss function. It is, however, more sensitive to outliers.
MAE is the abbreviation for Mean Absolute Error. The L1 loss function is another name for it. When a dataset has a large number of outlier cases, it is sometimes employed as an alternative to the MSE.
Huber loss: It is in between MSE and MAE with a parameter known as threshold. It calculates the quadratic loss by default (MSE) and if the loss exceeds the threshold, MAE is calculated and applied.

MSE has the advantage of penalizing large errors amd making it more appropriate in some situations, such as when being wrong by 10 is more than twice as detrimental as being off by 5. However, if being off by ten is twice as bad as being off by five, MAE is the better option.

Classification

Binary Classification and Multi-class classification are the two types of categorical outputs. The following are the loss functions for these tasks:

Categorical cross-entropy: When the instances are to be classified into two or more classes. Essentially it is known as binary cross entropy if it is used to classify 2 classes
Multi-label classification: The concept is the same for multi-label classification. Instead of having three labels to represent three classes, we now have six labels (class1=1, class1=0, class2=1, class2=0, class3=1, and class3=0) to indicate the existence or absence of each class. The total loss is then the sum of the cross-entropy losses for each of these six classes.

Check your understanding:

You read that a set of temperature forecasts shows a MAE of 1.5 degrees and a RMSE of 2.5 degrees. What does this mean? Choose the best answer:

There is some variation in the magnitude of the errors
Very large errors are unlikely to have occured
The average difference between the forecast and the observed temperature was 1.5 degrees
All of the above

…

… Scroll down for the answer

…

Answer: 4. All of the above.

MSE or MAE? Which and Why? Loss Functions used in Regression and Classification.

Written by Angela W.