Optimal CNN development: Use Data Augmentation, not explicit regularization (dropout, weight decay)

Less Wright
The Startup
Published in
3 min readJun 15, 2019

--

Dropout, weight decay and data augmentation have all become part and parcel of every CNN developers standard toolkit. The assumption has been that each contribute in a hopefully synergistic manner towards producing an optimal CNN. However, recent research by Garcia-Konig (https://arxiv.org/abs/1806.03852v4 ) explored this assumption and found that a much better development approach is to rely entirely on data augmentation to produce self-regularized CNN’s, and forego the use of dropout and weight decay.

They find that while weight decay and dropout do enhance regularization, the average effect from it is 3.06% improvement in accuracy, versus light augmentation *alone* improves accuracy an average of 8.46%.

Further in comparing augmentation *alone* vs augmentation plus weight decay and dropout (the standard tool set) — augmentation alone equals or betters the performance of the combination set, ranging from 8.57% and 7.90% on different testing.

In other words, dropout and weight decay are a crutch that ultimately produce less than optimal CNN’s, and the optimal strategy is replacing both with more use of data augmentation.

Below is a chart showing the performance comparisons:

Image from Garcia-Konig paper — purple is augmentation alone, red is augmentation + weight decay and Dropout. For the ‘all CNN’ category, augmentation alone consistently outperforms. Tested on ImageNet, CIFAR-10 and CIFAR-100.

--

--

Less Wright
The Startup

PyTorch, Deep Learning, Object detection, Stock Index investing and long term compounding.