My Deep Learning predictions will not be at the same conceptual level as my previous predictions. I’m not going to predict enterprise adoption but I rather am going to focus on research trends and predictions. Without a doubt, Deep Learning will drive AI adoption into the enterprise. For those still living underneath a rock, it is a fact that Deep Learning is the primary driver and the most important approach to AI. However, what is not so obvious is what kind of new capabilities will arise in 2017 that will lead to exponential adoption.
So here come my fearless predictions for 2017.
- Hardware will accelerate doubling Moore’s law (i.e. 2x in 2017).
This, of course, is entirely obvious if you track developments at Nvidia and Intel. Nvidia will dominate the space throughout the entire 2017 simply because they have the richest Deep Learning ecosystem. Nobody in their right mind will jump to another platform until there is enough of an ecosystem developed for DL. Intel Xeon Phi solutions are dead on arrival with respect to DL. At best they may catch up in performance with Nvidia by mid-2017 when the Nervana derived chips come to market.
Intel’s FPGA solutions may see adoption by cloud providers simply because of economics. Power consumption is the number one variable that needs to be reduced. Intel’s Nervana based chip will likely clock in at 30 teraflops by mid-2017. That’s my guesstimate, but given that Nvidia is already at 20 teraflops today, I wouldn’t bet on Intel having a major impact until 2018. The only big ace that Intel may have is in 3D XPoint technology. This will help improve the entire hardware stack but not necessarily the core accelerator capabilities considering that GPUs use HBM2 that’s stacked on top of the chip for performance reasons.
Amazon has announced its FPGA based cloud instance. This is based on Xilinx UltraScale+ technology and are offering 6,800 DSP slices and 64 GB of memory on a single instance. That’s impressive capability however, the offering may be I/O bound by not offering the HBM version of UltraScale+. The lower memory bandwidth solution as compared with Nvidia, Intel, and even AMD may give developers pause as to whether to invest in a more complicated development process (i.e. VHDL, Verilog etc).
In late breaking news, AMD has revealed its new AMD Instinct line of Deep Learning accelerators. The specifications of these are extremely competitive versus Nvidia hardware. This offering is scheduled to be available early 2017. This is probably should be enough time for AMDs ROCm software to mature.
2. Convolution Networks (CNN) will Dominate
CNNs will be the prevalent bread-and-butter model for DL systems. RNNs and LSTMs with its recurrent configuration and embedded memory nodes are going to be used less simply because they would not be competitive to a CNN based solution. Just like GOTO disappeared in the world of programming, I expect the same for RNNs/LSTMs. Actually, parallel architectures trump sequential architectures in performance.
Differentiable Memory networks will be more Common. This is just a natural consequence of architecture where memory will be refactored out of the core nodes and just reside as a separate component from the computational components. I don’t see the need for forget, input and output gates for LSTM that can be replaced by auxiliary differentiable memory. We already see conversation about refactoring the LSTM to decouple memory (see Augmented Memory RNN).
3. Designers will rely more on Meta-Learning
When I began my Deep Learning journey, I had thought that optimization algorithms, particularly ones that were second-order would lead to massive improvements. Today, the writing is on the wall, DL can now learn the optimization algorithm for you. It is the end of the line for anybody contemplating a better version of SGD. The better version of SGD is the one that is learned by a machine and is the one that is specific to the problem at hand. Meta-learning is able to adaptively optimize its learning based on its domain. Further related to this is whether alternative algorithms to backpropagation will begin to emerge in practice. There is a real possibility that hand tweaked SGD algorithm may be in its last legs in 2017.
4. Reinforcement Learning will only become more creative
Observations about reality will always remain imperfect. There are plenty of problems where SGD is not applicable. This just makes it essential that any practical deployment of DL systems will require some form of RL. In addition to this, we will see RL used in many places in DL training. Meta-Learning, for example, is greatly enabled by RL. In fact, we’ve seen RL used to find different kinds of neural network architectures. This is like Hyper-parameter optimization on steroids. If you happen to be in the Gaussian Process business then your lunch has just been eaten.
5. Adversarial and Cooperative Learning will be King
In the old days, we had monolithic DL systems with single analytic objective functions. In the new world, I expect to see systems with two or more networks cooperation or competing to arrive at an optimal solution that likely will not be in analytic form. See “Game Theory reveals the future of Deep Learning”. There will be a lot of research in 2017 in trying to manage non-equilibrium contexts. We already see this now where researchers are trying to find ways to handle the non-equilibrium situation with GANs.
6. Predictive Learning or Unsupervised Learning will not progress much
“Predictive Learning” is the new buzzword that Yann LeCun in pitching in replacement to the more common term “Unsupervised Learning”. It is unclear whether this new terminology will gain adoption. The question though of whether Unsupervised or Predictive Learning will make great strides in 2017. My current sense is that it simply will not because there seems to be a massive conceptual disconnect as to how exactly it should could work.
If you read my previous post about “5 Capabilities of Deep Learning Intelligence”, you get the feeling that Predictive Learning is some completely unknown capability that needs to be shoehorned into the model that I propose. Predictive Learning is like the cosmologists Dark Matter. We know it is there, but we just don’t know how to see it. My hunch is that it has something to do with high entropy or otherwise randomness.
7. Transfer Learning leads to Industrialization
Andrew Ng thinks this is important, I think so too!
8. More Applications will use Deep Learning as a component
We saw this already in 2016 where we see Deep Learning used as a function evaluation component in a much larger search algorithm. AlphaGo employed Deep Learning in its value and policy evaluations. Google’s Gmail auto-reply system used DL in combination with beam searching. I expect to see a lot more of these hybrid algorithms rather than new end-to-end trained DL systems. End-to-end Deep Learning is a fascinating area of research, but for now hybrid systems are going to be more effective in application domains.
9. Design Patterns will be increasingly Adopted
Deep Learning is just one of those complex fields that need a conceptual structure. Despite all the advanced mathematics involved, there’s a lot of hand waving and fuzzy concepts that can best be captured not by formal rigor but rather with a method that has been proven to be effective in other complex domains like software development. I predict practitioners will finally “get it” with regards to Deep Learning and Design Patterns. This will be further motivated by the fact that Deep Learning architectures are becoming more modular rather than monolithic.
10. Engineering will outpace Theory
The background of researchers and the mathematical tools that they employ are a breeding ground for a kind of bias in their research approach. Deep Learning systems and Unsupervised Learning systems are likely these new kinds of things that we have never encountered before. Therefore, there is no evidence that our traditional analytic tools are going to be any help in unraveling the mystery as to how DL actually works. There are plenty of dynamical systems in physics that have remain perplexed about for decades, I see the same situation with regard to dynamical learning systems.
This situation, however, will not prevent the engineering of even more advanced applications despite our lack of understanding of the fundamentals. Deep Learning is almost like biotechnology or genetic engineering. We have created simulated learning machines, we don’t know precisely how they work, however that’s not preventing anyone from innovating.
I’ll come back to these predictions in a year from now. Wish me luck!