Although I also think ML is more than stats, this sentence shows that something is extremely wrong with current state of things.

First of all, claiming you develop deep understanding of “Wasserstein GAN” with virtually no background in stats is just absurd. How can you “deeply” understand why using Wasserstein distance improves things when compared to KL-divergence. Implementing W-GAN does not make you develop a deep understanding of Wasserstein distance, KL-divergence or IPMs in general. It also does not make you have a deep understanding of generative models. For example, by following recipes online I make coctails, do I have “deep understanding” of mixology?

Second of all, you need statistics background to have deep knowledge of cutting edge machine learning. How can you even understand experimental results if you have no statistics background. If you do not know what bias and variance is, how can you develop inutitions of bias-variance trade-off. If you have no idea about probability metrics and variational methods, how can you have deep knowledge of VAE. If you have no understanding of Fisher information, how can you have deep knowledge of learning without forgetting….

Finally, Dropout comes almost naturally from ensemble methods which is studied in statistics for ages. Similarly, statistics is full of models with 100 million parameters and even models with infinitely many parameters. Science is cumulative and all the shiny algorithms we have have roots in statistics since it is the field which studied data for decades.

I am not a statistician, I am a ML researcher and I have studied enough statistics to understand that statistics is way beyond data. It is the theory of anything empirical. Understanding statistics make you design better experiments, better algorithms and better data collection procedures. I also understand that there is more to ML than stats. I find both the original claim and this post silly since naming is the least important thing in the science. We are (statisticians, ML researchers, logicians, etc) together in this and trying to develop algorithms which can learn from data. Both original claim and this post is distraction at its best.