Understanding Dropout
One particular layer that is useful, yet mysterious when training neural networks is Dropout. Dropout is created as a regularization technique, that we can use to reduce the model capacity so that our model can achieve lower generalization error. The intuition is easy, we didn’t use all neurons but only turn on some neuron in each training iteration with probability p. But how does dropout works, and is it the same as the implementation?