Batch Normalization, and the zoo of related normalization strategies that have grown up around it, have played an interesting array of roles in recent deep learning research: as a wunderkind optimization trick, a focal point for discussions about theoretical rigor and, importantly, but somewhat more in the sidelines, as a…