Influence of large batches
My deep learning notes
Hi back, it’s Wednesday morning and I’ll be sharing my personal notes on large batches in deep learning.
Have a read at my two previous articles about this topic:
https://medium.com/@mastafa.foufa/influence-of-large-batches-5f1d8a00891c
https://medium.com/@mastafa.foufa/influence-of-large-batches-ba0ad9894f11
My main resource here:
ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA
In the last article, we talked about flat and sharp minima and saw that large batches might lead to sharp minima. This is problematic as a slight move away from the expected loss at testing time lead to a huge loss. In other words, the model learns how to minimize locally the loss but have a hard time doing so at testing time with unknown data points.
Below are my last words last time, let’s take it from there now:
It would be great to understand how we can ultimately end up in a sharp minima vs a flat minima and get some intuition ourselves.
We have twenty minutes to go throug that and try to get at least some intuition about the underlying phenomenon.