What is the simplest way to prevent Overfitting?
Ways to avoid a Neural Network from Overfitting.
The dropout technique in my understanding is tricky but practical. The term “dropout” is used for a technique which drops out some nodes of the network. Dropping out can be seen as temporarily deactivating or ignoring neurons of the network. This technique is applied in the training phase to reduce overfitting.
What is Overfitting?
Overfitting is an error which occurs when a network is too closely fit to a limited set of input samples. Traditionally it is defined as training some flexible representation so that it memorizes the data and carries that noise resulting in failure to predict in the future. A fully connected layer occupies most of the parameters, and hence, neurons develop co-dependency amongst each other during training which curbs the individual power of each neuron leading to over-fitting of training data.
What is Dropout?
Let us understand this technique in layman’s terms. Consider you meet lots of people every single day. While talking to them in person, you put the face to the name. At some point you are forced to communicate via the phone. You don’t recognize the same person, because you have seen him/her in person and you had put the face to his/her name. Now, imagine that you can only talk to the person via a phone. In this case you will have to learn to put his/her voice to the name. So by dropping out the visual features you are forced to focus on the voice features.
In machine learning, regularization is the way to prevent over-fitting. Regularization reduces over-fitting by adding a penalty to the loss function. (Regularization in detail here). Dropout is a technique to regularize in neural networks. When we drop certain nodes out, these units are not considered during a particular forward or backward pass in a network.
Like the figure shown above, the dropout will randomly mute some neurons in the neural network. At each training stage, individual nodes are either dropped out of the net with probability 1-p or kept with probability p, so the overall network size reduces. Incoming and outgoing edges to a dropped-out node are also removed. Usually, p of muting neurons is often set as 0.5. When p is 1, no neurons are dropped.
How does Dropout prevent overfitting?
Dropout forces a neural network to learn more robust features that are useful in conjunction with many different random subsets of the other neurons. It will make the weights spread over the input features instead of focusing on just some features.
If your network is significantly overfitting, dropout will reduce your errors by a large number. Dropout will approximately double the number of iterations required to converge but reduce the training time for each epoch. In the technique of ‘Denoising autoencoders’ developed by Pascal Vincent, Hugo Larochelle and Yoshua Bengio, dropout was applied at the input layer. Similarly, ‘Object recognition net’ developed by Alex Krizhevsky also uses this technique.
- Hinton et al, [1207.0580] Improving neural networks by preventing co-adaptation of feature detectors, 2012 (probably the original paper on dropout)
- Lecture 10.5 — Dropout (By Prof. Hinton)
- Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Warde-Farley et al, [1312.6197] An empirical analysis of dropout in piecewise linear networks, 2014 (analyzes dropout specially for the case of using ReLU as activation function -arguably the most popular- , and checks the behavior of the geometric mean for ensemble averaging)
For more such answers to important Data Science questions, please visit Acing AI.
Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!
Acing AI Newsletter — Reducing the entropy in Data Science and AI. Aimed to help people get into AI and Data Science by…www.getrevue.co
Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.