Dropout & Early stopping

moodayday™
AI³ | Theory, Practice, Business
3 min readSep 20, 2019

I don’t know for you but to me, these two terms sound very close in meaning.. They actually represent two different approaches to regularization.

What is regularization?

The goal of regularization techniques is to reduce overfitting, i.e. to prevent the model to get overtrained on the training dataset.

There are so many regularization techniques that can be used when training our machine learning models:

  • weight sharing
  • data augmentation
  • pre-training (e.g. say use ImageNet learnt weights before training classifier on your own dataset)
  • max-norm
  • mixup
  • early stopping
  • dropout

It’s a good idea to learn about all these techniques and try out combinations of them to appreciate their respective efficiency; they can really make an amazing difference!

We will try to cover most of those topics on this blog in a near future.

Personally, it seems to me that most people in the machine learning community are very happy with the combination dropout+early stopping.

So, today, let’s say a few words about these two techniques…

What is “Dropout” all about?

Published in 2012 in this paper, this method has received a wide applause and is still very effective nowadays.

Dropout is a technique that makes your model learning harder, and by this it helps the parameters of the model act in different ways and detect different features, but even with dropout you can potentially overfit your traning set.

Here is a slight idea about the how this method expects to achieve that goal:

Ensembles of neural networks with different model configurations are known to reduce overfitting, but require the additional computational expense of training and maintaining multiple models.

A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training. This is called dropout and offers a very computationally cheap and remarkably effective regularization method to reduce overfitting and improve generalization error in deep neural networks of all kinds.

Please check this nice post out to know more about it.

You also have these enlightening papers:

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

https://arxiv.org/pdf/1512.05287.pdf

What is Early stopping?

When training a deep learning model, one of the crucial question is about when should the training stop. I mean, here is the problem stated simply: it might be intuitive to most of us that the more you train on a certain task the better you get at it.What might be a little less obvious is that too much time spent training on the same specific, narrow task makes the trainee get overtrained in that specific narrow task and be the best at but fail to adapt when the task changes significantly enough.

This is a problem to take into account seriously when training humans just as much when training deep learning models:

  • Too little training will mean that the model will underfit (i.e. will not learn enough feature from the data to perform well at prediction) the train and the test sets.
  • Too much training will mean that the model will overfit (i.e. overtrained on the training dataset but adapt poorly when variations occur) the training dataset and have poor performance on the test set.

A compromise is to train on the training dataset but to stop training at the point when performance on a validation dataset starts to degrade.

This simple, effective, and widely used approach to training neural networks is called early stopping.

https://arxiv.org/pdf/1611.03530.pdf

Oh, and… (although early stopping is such a nice technique)…Andrew Ng doesn’t seem to recommend using it.

IMHO, it’s worth trying to understand why…

Thanks for reading!

References

--

--