Optimal CNN development: Use Data Augmentation, not explicit regularization (dropout, weight decay)
Dropout, weight decay and data augmentation have all become part and parcel of every CNN developers standard toolkit. The assumption has been that each contribute in a hopefully synergistic manner towards producing an optimal CNN. However, recent research by Garcia-Konig (https://arxiv.org/abs/1806.03852v4 ) explored this assumption and found that a much better development approach is to rely entirely on data augmentation to produce self-regularized CNN’s, and forego the use of dropout and weight decay.
They find that while weight decay and dropout do enhance regularization, the average effect from it is 3.06% improvement in accuracy, versus light augmentation *alone* improves accuracy an average of 8.46%.
Further in comparing augmentation *alone* vs augmentation plus weight decay and dropout (the standard tool set) — augmentation alone equals or betters the performance of the combination set, ranging from 8.57% and 7.90% on different testing.
In other words, dropout and weight decay are a crutch that ultimately produce less than optimal CNN’s, and the optimal strategy is replacing both with more use of data augmentation.
Below is a chart showing the performance comparisons: