List of deep learning losses for regression

Alvaro Durán Tovar
Deep Learning made easy
2 min readDec 1, 2020
Photo by Markus Spiske on Unsplash

Mean absolute value

Useful when having outliers but you don’t want them to have a big influence.

(y-ŷ).abs().mean()

Couple downsides with this loss:

  • The derivative is always the same (+/- 1).
  • The function isn’t differentiable at 0.

Huber loss is a good alternative for this reasons.

Mean squared error

Useful when you want to penalize big differences (high errors).

(y-ŷ).pow(2).mean()

Root mean squared error

The problem with MSE is that the obtained value is not in the original unit but on the root of the original unit, that means if we are using seconds, the value will be in seconds², which isn’t really useful.

To recover the original unit we take the square root obtaining something like an average where we strongly penalize big values. RMSE can be useful to use on reports in combination with other metrics like the MAE.

(y-ŷ).pow(2).mean().sqrt()

Huber loss

Similar to MAE, but differentiable everywhere with a smooth change in trend near 0. Stronger against outliers than MSE. It’s like a combination of L1 and L2 (MAE and MSE), depending on the magnitude of the difference between the target value and the predicted value. 𝛿 is another hyperparameter that indicates where to start applying a calculation similar to L2 (when approaching 0), L1 like otherwise.

delta = 1
mask = (y — ŷ).abs() <= delta

up = (0.5 * (y — ŷ)**2) * mask
down = (delta * (y -ŷ).abs() — 0.5 * delta**2) * ~mask
(up + down).mean()

--

--