11 Arguments Experts get Wrong about Deep Learning
I spend most of my waking time ( and likely my subconscious works overtime while I sleep ) studying Deep Learning. Peter Thiel has a phrase, “The Last Company Advantage”[THI.] Basically you don’t necessarily need to have the “First Mover Advantage” however you absolutely want to be the last company standing in your kind of business. So Google may be the last Search company, Amazon may be the last E-Commerce company and Facebook hopefully will not be the last Social Networking company. What keeps me awake at night though is that Deep Learning could in fact be the “Last Invention of Man”!
However, let’s ratchet it down a little bit here. After all, Kurzweil’s Singularity (estimate is 2045) is still 3 decades away. That’s still plenty of time for us humans to scheme on our little monopolies. Your objective in the next 30 years of humankind is to figure out if you are going to be living in Elysium or in some unnamed decaying backwater:
Credit: Elysium the movie, not the life-extension supplement.
To aid you in your decision making, here are 11 reasons why your “experts” will lead you to miss the all important Deep Learning revolution:
1. It’s just Machine Learning
Well Traveled Road Effect Bias.
Practitioner’s introduction to neural networks are almost always via the introduction of linear regression and then to logistic regression. That’s because the mathematical equations for an artificial neural network (ANN) are identical. So there immediately is a bias here that the characteristics of these classical ML methods would also convey into the world of DL. After all, DL in its most naive explanation is nothing more than multiple layers of ANN.
There are also other kinds of ML methods that have equations that are different from DL. The basic objective however for all ML methods is a general notion of curve fitting. That is if you can have a good fit of a model with the data then that perhaps is a good solution. Unfortunately with DL systems, due to the fact that the number of parameters in the model are so large, these systems by default will over-fit any data. This is enough of a tell that a DL is an entirely different kind of animal from an ML system.
2. It’s just Optimization
DL systems have a loss function that is a measure of how well its predictions match its input data. Classic optimization problems also have loss functions (also known as objective functions). In both systems, different kinds of heuristics are used to discover an optimal point in a large configuration space. It was once thought that the solution surface of a DL system was sufficiently complex enough that it would be impossible to arrive at a solution. However, curiously enough, one of the most simple methods of optimization, the Stochastic Gradient Descent algorithm, is all that is need to arrive at surprising results.
What this tells you is that is something else going on here that is actually very different from what optimization folks are used to.
3. It’s a black box
A lot of Data Scientists have an aversion for DL because of the lack of interpretability of its predictions. This is a characteristic of not only DL methods but classical ML methods as well. Data Scientists would rather use Probabilistic methods where they can have better control of the models or priors. As a result have systems that are able to make predictions with the least number of parameters. All driven by the belief that parsimony or Occam’s razor is the optimal explanation for everything.
Unfortunately, probabilistic methods are not competitive in classifying images, speech or even text. That’s because DL methods are superior in discovering models than human beings. Brute force just happens to trump wetware. No Data Scientist has ever been able to find the ‘principal components’ that will do image classification well. Furthermore, there’s no experimental evidence in the DL space that parsimonious models work any better than entangled models. For those cases where it is an absolute requirement to have some kind of explanation, there are now newer methods in DL that provide aid to interpretability as well as uncertainty. If a DL system can generate the captions in an image, then there is a good chance that it can be trained to generate an explanation of a prediction.
4. It’s too early and too soon
Illusion of Validity Bias
This is a natural bias that something that is around 5 years old and rapidly evolving is too new and volatile a technology to trust. I think we all said the same thing when the microprocessor, internet, web, mobile technologies came along. Wait and see was the safe approach for most everyone. This is certainly a reasonable approach for anyone who has not really spent the time investigating the details. However, it is a very risky strategy, ignorance may be bliss but another company eating your lunch can mean extinction.
5. There is too much hype.
There are a lot of things that DL can do that were deemed inconceivable just a couple years ago. Nobody expected a computer to beat the best human player in Go. Nobody expected self-driving cars to exist today. Nobody expected to see Star Trek universal translator like capabilities. It is so unbelievable that it must likely be an exaggeration than something that may be real. I hate however to burst your bubble of ignorance, DL is in fact very real and you experience it yourself with every smartphone.
6. AI winter will likely come again.
Frequency Illusion Bias
We’ve had so many times where the promise of AI had lead to disappointing results. The argument goes further that because it has happened so often before, that it is also bound to happen again. The problem with this argument is that despite the disappointment, AI research has led to many software capabilities that we do take for granted today and thus never notice its existence. Good old fashioned AI (GOFAI) are embedded in many systems today.
The current pace of DL development is accelerating and there are certainly certain big problems that need to be solved. The need for a lot of training data and the lack of unsupervised training are two problems. This however doesn’t mean that what we have today has no value. DL can already drive cars, that in itself tells you that even if another AI winter arrives, we would have achieved a state of development that is still quite useful. Andrew Ng has more about this [NG].
7. There’s not enough theory of how it works.
System Justification Bias
The research community does not have a solid theoretical understanding as to why DL works so effectively. We have some idea as to why a multi-layer neural network is more efficient in fitting functions than one with fewer layers. We, however, don’t have an understanding as to why convergence even occurs or why good generalization happens. DL at this time is very experimental and we are just learning to characterize these kinds of systems. Meanwhile, despite not having a good theoretical understanding, the engineering barrels forward. Researchers, using their intuition and educated guesses are able to build exceedingly better models. In other words, nobody is stopping their work to wait for a better theory. It is almost analogous with what happens in biotechnology research. People are experimenting with many different combinations and arriving at new discoveries that they have yet to explain. Scientific and technological progress is very messy and one shouldn’t shy away from the benefits because of the chaos.
8.It is not biologically inspired.
DL system are very unlike the neurons in our brain. The mechanism of how DL learns (i.e. SGD) is not something we can explain happening in our brain. The argument here though is that if it doesn’t resemble the brain then it is unlike to be able to perform the kind of inference and learning of a brain. This, of course, is an extremely weak argument. After all, planes don’t look like birds, but they certainly can fly.
9. I’m not an expert in it.
Not Invented Here Bias
Not having expertise in-house shouldn’t be an excuse for avoiding finding expertise outside. Furthermore, should prevent you from having your experts learn this new technology. However, if these experts are of the dogmatic persuasion, then that should be a tell for you to get a second and unbiased opinion.
10.It does not apply to my problems
Businesses are composed of many business processes. Unless you have not gone through the exercise of examining which processes can be automated with current DL technologies, then you are not in a position to make the statement that DL does not apply to you. Furthermore, you may discover new processes and business opportunities may not exist today but are possible with the exploitation of DL technology. You cannot really answer this question until you invested in some due diligence work.
11. I don’t have the resources
Status Quo Bias
The large internet companies like Google and Facebook have gobbled up a lot of the Deep Learning talent out there. These companies have very little interest in working with a small business to identify their specific needs and opportunities. However, fortunately, these big companies have been gracious enough to allow their researchers to publish their work. We, therefore, do have a view into their latest developments and thus are able to take what they’ve learned and apply it to your context. There are companies like Intuition Machine that do have an on-boarding process for you to get a competitive head start in DL technologies.
If you want a first mover advantage, then urgently reach out to Intuition Machine. They’ve got a Deep Learning guide to turn your business into one that can be potentially disruptive.
Originally published at blog.alluviate.com.