The year 2018 is ending and it’s now time for a new set of Deep Learning predictions for 2019. Here are my previous predictions and retrospectives for 2017 and 2018.
2017 Predictions and Retrospective. These articles have predictions covering hardware acceleration, the dominance of Convolutional Neural Networks (CNN), meta-learning, reinforcement learning, adversarial learning, unsupervised learning, transfer learning, Deep Leaning (DL) as components, design patterns and experiments outpacing theory.
2018 Predictions and Retrospective. While these predictions for 2018 covers hardware startups, meta-learning replacing SGD, generative models, self-play, the semantic gap, explainability, research deluge, teaching environments, conversational cognition, and AI ethics.
Retrospectives of my predictions reveal that I was too optimistic and tended towards undershooting eventual reality. Now with an improved understanding of the DL field, the real obstacles to progress have now been revealed. The community, in general, has been in a state of inflated expectations. This, in hindsight, is due to ignorance of the underlying complexity of general cognition. We now have to dial down our expectations and focus exclusively in areas that have known to show promise. These promising areas will make incremental progress and not “moon shots”.
Now that I have formulated a more accurate capability maturity model that I can better make predictions on the where DL is heading. Here is my proposed maturity model. I cannot emphasize enough the importance of this model. AGI is very difficult to predict, but you need to become concerned in the event the higher levels are achieved in unexpected ways. Here’s a corresponding graphic:
Revolutionary progress happens in stages and what we are encountering today is a major obstacle achieving the Interventional level (I am also unsure if there is another layer prior to the Interventional level). This does not mean that we cannot make any progress, rather there are many low-hanging fruits in the current maturity level (i.e. Dual Process) that is primed for exploitation. The progress of DL in 2019 will primarily be around this pragmatic realization that hand-engineering and DL can be a fruitful combination.
Here are my predictions and as in previous years, they serve as a guide to tracking DL progress.
1. Deceleration in DL Hardware Acceleration
The low hanging fruit of hardware for DL has been picked. Systolic arrays gave the world the massive speedup increases in 2017. We cannot expect major increases in computational power in 2019. NVidia’s Turing cores are only fractionally faster than Volta cores. Google’s TPUv3 system are now liquid cooled to allow for higher density as compared to its predecessors. I don’t expect any major architectural improvements in 2019, therefore don’t expect the kind of massive gains as in previous years.
However, we shall see newer architectures from GraphCore and Gyrfalcon circumvents the power costs of memory transfers and support sparse operations, however changes in DL formulation will be needed to accommodate these new architectures. New hardware research needs to be performed that is inspired by Nano-intentionality found in biology.
2. Unsupervised Learning has been solved, but it’s not what was expected
The mindset for Unsupervised Learning is all wrong. LeCun’s layered cake is all wrong, and the relationships of different kinds of learning should look like this:
Why is UL the least valued and the least difficult? That’s because there’s no goal and you can just cook up any clustering that may or may not work. Ultimately, it boils down to how higher layers perform based on the UL embeddings. UL embeddings are essentially data that contains a rich set of priors, how these priors are exploited depends on upstream processes that do have objectives. What’s been discovered by ELMO and BERT is that we can train UL that predicts (or generates) its data and this serves as a good base for upstream tasks. UL is essentially Supervised Learning with the label already existing in the data. In short, UL has been solved but not in the way most practitioners were expecting. If a network can make good predictions or can generate good facsimiles of the original data, then that’s all there is to UL.
So everyone thought solving UL would be a major advance because one would use data free of human labeling. Unfortunately, it was a red herring, it’s been solved because something that comes for free is very easy to extract. My prediction for UL in 2019 is that researchers accept this new viewpoint and instead focus on more valuable research (i.e. continual or interventional learning).
3. Meta-Learning will be for Research Only
Understanding of meta-learning (i.e. Learning to Learn) seems to be as nebulous as our understanding of Unsupervised Learning. Meta-learning, as it is practiced today, is more like transfer learning (i.e. interpolative learning). A more advanced kind of meta-learning is one that can create build and improve its own models. Meta-learning should be able to build extrapolative and inventive learning models. We are nowhere close to achieving this capability.
Any learning method that is applicable to many domains is technically a meta-learning algorithm. As an example, Gradient Descent, Genetic Algorithms, Self-Play and Evolution are all meta-learning algorithms. The goal of meta-learning approaches is to develop algorithms that learn well across many domains.
There exist very few known meta-learning algorithms, but there is one meta-learning algorithm that we know exists yet we don’t understand. We don’t understand the meta-learning algorithm used by humans. Furthermore, Meta-learning is too general a problem to understand how to solve in a universal manner. Like unsupervised learning, there is likely no free lunch.
I suspect that specific methods described below (i.e. generative models, hybrid models, and curriculum training) will have a much better chance in achieving valuable results. This means that the meta-learning algorithms that we discover are useful only for specific kinds of learning tasks. Just like learning to learn by gradient descent speeds up gradient descent for only a specific task, meta-learning can only improve learning in tasks it has seen. In short, the meta-learning at best is interpolative and can’t generalize. There likely exists no universal meta-learning methods but rather there exists a suite of meta-learning methods that can be pieced together to yield an effective curriculum.
In summary, Meta-Learning research (with the exception of neural architecture search) will remain a research curiosity.
4. Use of Generative Computational Modeling in Science
We are going to develop better control of our Generative models. There are three classes of generative models that have shown to be effective: Variational Autoencoders, GANs, and Flow-based models. I do expect to see a majority of progress in the GAN and Flow-based models and minimal progress in VAE. I will also expect to see applications of this in scientific exploration that deal with complex adaptive systems (i.e. weather, fluid simulations, chemistry, and biology).
Progress in this area will have a profound influence in the progress of science.
5. Use of Hybrid Models in Prediction
Deep Learning has continued to show its strength in providing predictions of high-dimensional systems. DL, however, is still unable to formulate their own abstract models and this will remain a fundamental obstacle towards explainability and extrapolative predictions. To compensate for these limitations, we shall see hybrid dual process solutions that do incorporate existing models in combination to model-free learning.
I see more work in model-based RL instead of model-free RL. I suspect the inefficiency of model-free RL can be mitigated using hand-crafted models. I expect progress in Relational Graph Networks and see impressive results when these graphs are biased with prior model-based models. I also expect to see advances in prediction capabilities by fusing existing symbolic algorithms in concert with DL inference.
Industrialization of DL will come not because we’ve made progress in transfer learning (as I incorrectly predicted in 2017) but rather through the fusion of human-crafted models and DL trained models.
6. More Methods for Imitation Learning
Imitation does not require extrapolative reasoning and therefore we shall continue to see considerable progress in imitating all kinds of existing systems. To be able to imitate behavior, a machine only needs to create a descriptive model that mirrors the behavior. This is an easier problem than generative modeling where unknown generative constraints have to be discovered. Generative models work so well because all it does is to imitate data and not infer the underlying causal model that generates the data.
7. More Integration of DL for Design Exploration
We shall see a lot of research in generative models migrating into existing design tools. This will occur first in visual domains and move progressively towards other modalities.
In fact, we might even consider the progress made by AlphaGo and AlphaZero as design exploration. Competitive Go and Chess players have begun to study the explorative strategies introduced by DeepMind’s game playing AI to develop new strategy and tactics that previously were unexplored.
The brute force capability and scalability that is available to DL methods are going to be brainstorming machines that will improve the designs done by humans. Many DL methods are now being integrated in products from Adobe and AutoDesk. Style2Paints is an excellent example of DL methods integrated with a standard desktop application.
DL will continue to be introduced as components in human workflow. DL networks reduce cognitive load that a person requires to fulfill tasks in a workflow. DL allows the creation of tools that are more adept in handling fuzzier and messier details of cognition. These fall under the need to reduce information overload, improve recall, extract meaning and faster decision making.
8. Decline of End-to-end training, more emphasis on Developmental Learning
End-to-end training will have diminishing returns. We will see networks trained in different environments to learn specialized skills. We shall see a new method to stitch together these skills as building blocks to more complex skills. I expect to see advances in Curriculum Training in 2019. I expect to see more research inspired by human infant development. Training networks to perform complex tasks will involve complex reward shaping and we, therefore, need improved methods on how to tackle this problem.
9. Richer Embeddings for Natural Language Processing
NLP has advanced in 2018 primarily due to advances in Unsupervised Learning approaches that create word embeddings. This is a continuation of the Word2Vec and Glove approaches. 2018 advances in NLP can be attributed to more advanced neural embeddings (ELMO, BERT). These embeddings have surprisingly improved many upstream NLP tasks across the board by simply substituting richer embeddings. Work in Relational Graph Networks can further enhance DL NLP capabilities.
The Transformer network has also proven to be extremely valuable in NLP and I do expect its continued adoption in other areas. I suspect the dominance of ConvNet networks will be challenged by Transformer network. My intuition behind this is that attention is a more universal mechanism for enforcing invariance or covariance than the fixed mechanism available to ConvNets.
10. Adoption of Cybernetics and System Thinking approaches
A major shortcoming of Deep Learning practice is a lack of understanding of the big picture. We are at a juncture where it needs to derive inspiration from more non-traditional sources. I believe these sources to come from older research in Cybernetics and its related discipline of Systems Thinking. We need to begin thinking about how to build robust Intelligent Infrastructure and Intelligence Augmentation. This requires going beyond the existing machine learning mindset that many researchers have grown up with.
Michael Jordan in his essay “Artificial Intelligence — The Revolution Hasn’t Happened Yet” remarks that Norbert Wiener’s Cybernetics has “come to dominate the current era”. Cybernetics and Systems Thinking will help us develop more holistic approaches to designing AI systems. Successful AI deployments will ultimately be tied to how they align with the needs of its human users. This will require exploring and formulating a holistic approach that integrates the variety of interacting parts.
Many novel approaches in DL can be traced back to older ideas in Cybernetics. There will be an increase in understanding that autonomous AI requires the inclusion of a subjective perspective in its models of the world. Predictive coding, inside-out architecture, embodied learning, just-in-time inference, intrinsic motivation, curiosity, self-models, and actionable representations are all related in this paradigm.
Deep Learning continues to make progress at break-neck speed and I do expect research to begin its inevitable transition into industrial applications. The general weakness in the understanding of DL in today’s marketplace is that of not being able to formulate holistic solutions to existing problems. DL cannot be a hammer where every problem is a nail. Rather, the ability to craft solutions that integrate DL as a component into a holistic whole will be a sought out skill set. A machine learning mindset is entirely the wrong perspective and a more suitable one can be found in Cybernetics. We might not achieve AGI in the short term, but the tools and methods available to Deep Learning serve as a solid foundation for surprisingly valuable applications to both science and commerce.