Optimal CNN development: Use Data Augmentation, not explicit regularization (dropout, weight decay)

Published in

The Startup

3 min readJun 15, 2019

Dropout, weight decay and data augmentation have all become part and parcel of every CNN developers standard toolkit. The assumption has been that each contribute in a hopefully synergistic manner towards producing an optimal CNN. However, recent research by Garcia-Konig (https://arxiv.org/abs/1806.03852v4 ) explored this assumption and found that a much better development approach is to rely entirely on data augmentation to produce self-regularized CNN’s, and forego the use of dropout and weight decay.

They find that while weight decay and dropout do enhance regularization, the average effect from it is 3.06% improvement in accuracy, versus light augmentation *alone* improves accuracy an average of 8.46%.

Further in comparing augmentation *alone* vs augmentation plus weight decay and dropout (the standard tool set) — augmentation alone equals or betters the performance of the combination set, ranging from 8.57% and 7.90% on different testing.

In other words, dropout and weight decay are a crutch that ultimately produce less than optimal CNN’s, and the optimal strategy is replacing both with more use of data augmentation.

Below is a chart showing the performance comparisons:

Optimal CNN development: Use Data Augmentation, not explicit regularization (dropout, weight decay)

Written by Less Wright