đź’ˇOver-parameterized models

Let’s start with the Bias-variance tradeoff :

Jaideep Ray
Better ML
2 min readJan 1, 2022

--

The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying complex structure in data (generalization), simple enough to avoid just memorizing patterns in data. (memorization). The classical approach is to find a sweet spot between generalization & memorization leads to the classic U shaped risk capacity curve.

The classic approach leads us in the direction that if you have a large enough model, it will have the capacity to overfit any set of labels including pure noise resulting in near zero training error and should typically generalize poorly.

Belkin et al. (2018) showed that the above stated statement might not hold true for modern deep neural networks.

Over-parameterization :

It refers to the scenario where the number of parameters of the model exceed the size of the training dataset or a similar threshold.

Many neural networks are trained in an over-parameterized regime. Due to their over-parameterized nature these models in principle have the capacity to overfit any set of labels including pure noise. Despite this models have low test error. These have near-zero training error.

U-shaped risk curve arising from the bias-variance trade-off. [1]
Double descent curve. The rhs shows the over parameterized regime : near zero training error.

The double descent risk curve introduced in this paper [1] reconciles the U-shaped curve predicted by the bias-variance trade-off and the observed behavior of complex deep learning models used in practice.

--

--