This was gold for me - I'd had terrible results in a regression problem (with a relatively simple NN), until I realized it was due to the dropout/batchnorm (even with large minibatches). Do you know any fixes for this? BN/Dropout/LayerNorm are pretty good tricks for stabilizing nets, it's a shame to be unable to lose them.