On the alchemical side of deep learning

Mr. Ali Rahimi recently gave a talk at NIPS, in which he described the current deep learning as alchemy, for its lack of theoretical explanation and being pragmatically oriented. If you haven’t seen it and are interested, here is the link:

I feel that what Rahimi described is something that I, as a deep learning student, can relate. Few years back when we first studied deep learning, we called ourselves “old Chinese medicine practitioners 老中医”. We grew up in an eastern culture, alchemy isn’t a term that’s familiar to us. But an old Chinese medicine practitioner is sort of the eastern counterpart of an alchemist. By calling ourselves that, we were referring to the fact that when designing a deep learning model, we had no idea how many layers or how wide our model should be and how many convolution layers vs. fully connected layers. There is no methodology to guide us. We, as well as many deep learning practitioners, have to spawn many models at once, and pick the best performing one. And normally there is no explanation as for why this one model is better than the rest. It could even be due to a better random starting point.

Doing blind experiments is the methodology traditional Chinese medicine have been used for thousands of years. Whereas less efforts have been spent on developing concrete theories or trying to explain the mechanisms of action of herbs or the principles of treatments. It’s unfair to say that traditional Chinese medicine has not developed any theory at all, but its theories have always been escalated to a philosophy level, very nebulous in my opinion (sometimes, this hides its problem of lacking understanding). Although it won China a Nobel prize, from time to time, some traditional treatments had been found to be harmful. And when an unexplainable deep learning model is applied to autonomous driving, we will have some fun!

I consider doing experiments a scientific approach. After all, it is being used actively and successfully in the fields of Chemistry and Physics, though not so much in the fields of computer science and mathematics. There is nothing wrong with it. Also, I think Mr. Rahimi’s talk wasn’t really criticizing the status quo of deep learning, but was reminding us that we shouldn’t throw away our curiosity and stop pursuing the truth and rules underneath the surface. Deep learning is currently a very pragmatic field. You see more professors in this field having close ties to the industry. They are the CTOs, VPs, VCs, CEOs of other companies. It may not be a bad thing, after all industry can provide huge amount of resources, and guide research to needed areas. But hopefully being money driven won’t make people short sighted.

A while back, I was curious about distributed deep learning training. As you know, deep learning models have densely interconnected parts, which by nature are hard to separate. There was this paper by a top AI company claiming that if you let two models (started from the same random point) converge separately for a few steps and then average them, you can still make the training procedure converge. I found it very mysterious, because if gradient descent is to find a good local minimum, averaging 2 local minimum points shouldn’t guarantee an even better local minimum point. Say, there are 2 people currently standing at the bottoms of 2 abysses, will their averaged location be at an even deeper abyss? All I was curious to know, before reading the paper, was the mathematical theory to back the above claim. And this, in my opinion, is the core sentence of the paper, hiding behind some thick introduction and related works:

My translation: “We tried, it worked (on MNIST), and deal with it!” Again, I consider experimenting a scientific approach. But is the result conclusive given only a limited number of experiments?

In my opinion, It’s ok that a work is based on unexplained blind experiments, but I hope its paper could be utterly honest about it. These days, you see too much decoration, marketing effort and pretentiousness in academic papers. That’s what frustrates me more. I get confused at what the purpose of academic papers is. Is it for spreading knowledge, promoting understanding? Or is it for scholars to declare victory? A lot of times, it feels like the later case.