AlexNet [2012, paper by Krizhevsky et al.] — Main ideas ReLU nonlinearity, training on multiple GPUs, local response normalization, overlapping pooling, data augmentation, dropout Why it is important AlexNet won the ImageNet competition in 2012 by a large margin. It was the biggest network at the time. The network demonstrated the potential of training large neural networks quickly on massive datasets using widely available…