Deep Learning Course — Lesson 9: Regularization Techniques

--

Overfitting and Underfitting in Neural Networks

Overfitting occurs when a neural network model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model.

Underfitting, on the other hand, refers to a model that can neither model the training data nor generalize to new data. An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data.

Regularization Techniques: L1, L2 Regularization, Dropout

Regularization methods introduce additional information (bias) to penalize extreme parameter weights.

  • L1 Regularization: Also known as Lasso Regression, adds “absolute value of magnitude” of the coefficient as a penalty term to the loss function.
  • L2 Regularization: Also known as Ridge Regression, adds “squared magnitude” of the coefficient as a penalty term to the loss function.
  • Dropout: This is a regularization method where during training, random subsets of the neurons’ outputs are set to zero within each update cycle. It is a very efficient way of performing model averaging with neural networks.

Additional Strategies: Data Augmentation, Early Stopping

  • Data Augmentation: Involves creating new training examples by applying various transformations on our existing dataset. Common transformations include rotations, translation, zooming, changing color schemes, etc.
  • Early Stopping: In this form of regularization, we make use of a validation set to evaluate the performance of our model during the training phase and stop training when the performance on the validation set starts getting worse. This helps in preventing the model from overfitting the training data.

These strategies and techniques help in improving the performance of the neural network models by preventing them from overfitting the training data and making them generalize better to unseen data.

--

--