Regularization

Gary(Chang, Chih-Chun)
Deep Learning#g
Published in
2 min readMay 19, 2018

Regularization is a common way to avoid model overfitting by adding an extra term in the loss function.

we often calculate the loss by computing the errors(distances) between the ground truths and the output results(as the function shown below). This might lead to a comparatively lower training loss which we consider it as an ideal model. However, when using the model on the testing dataset, we get a very high testing loss, so we call this phenomenon “overfit”.

xi represents for input data; y represents for model outputs; y’ represents for ground truths; wi and b are parameters of the model

We add an extra regularization term at the end, so the function becomes:

Loss function with regularization

The final term shows that we prefer smaller weights in the model. Smaller weights smoothen the model function, so, namely, the input is less sensitive to the output. Consequently, when the testing data is fed into the model, the noises among the testing data have insignificant influence on the outputs and we are able to get a better result.

We can set different value to the parameter 𝜆, depending on which value obtains lower loss.

I use the regularization loss in my handwritten digit recognition project. For more details, please see the github page. https://github.com/gary30404/neural-network-from-scratch-python

If you like this article and consider it useful for you, please support it with 👏.

--

--