Train your deep model faster and sharper — two novel techniques
Harshvardhan Gupta

Actually, from my own experience, pre-computing the activations doesn’t help because while the original image has 3 channels, the hidden representation may have 64, which is a 21-fold increase in bandwidth, that is a bottleneck for the GPU. And, of course, you cannot do data augmentation efficiently.

Show your support

Clapping shows how much you appreciated David Menéndez Hurtado’s story.