We are very quickly nearing the end of 2018. As I have done in previous years, this is a retrospective of my prediction for Deep Learning in 2018. The purpose of this exercise is to get a measure of the rapid progress of Deep Learning. By examining my own expectations and that of reality (by the end of the year), I help calibrate my intuition on how rapid this field is developing.
My DL predictions for 2018 is reviewed below with commentary on its progress in the year:
- A Majority of Deep Learning Hardware Startups will Fail
Not many DL hardware companies have yet to *publicly* fail in 2018 (here’s one), yet all of them have yet to deliver product. What’s fascinating is that despite lack of product, these startups have been able to raise even more money!
My argument last year was that DL startups would fail to appropriately estimate the cost to deliver usable software for their potential customer bases. The DL stack keeps getting richer and more complex, and it will be problematic for startups to catchup.
The biggest failure to launch goes to Intel. Did they not have a lot of fanfare in NIPS 2017 about finally delivering silicon? Well, their Nervana derived-product is MIA in 2018. Let’s see if they can deliver “Spring Crest” by Spring 2019.
There are a whole bunch of startups claiming to have working silicon. Deep Learning AI is sexy and everyone wants to “invent” their own chip. A short example includes: GraphCore, Wave Computing, Groq, Habana, Bitmain, Cabricon, Esperanto ,Novumind, Gryfalcon, PFN, Hailo and Horizon. This doesn’t include traditional semiconductor manufacturers like Samsung, ARM, Xilinx, Qualcomm, and Huawei. In addition the traditional cloud vendors aren’t buying but inventing their own AI chips: Amazon’s Inferentia, Microsoft’s Brainwave, Facebook, Alibaba’s AliNPU and Baidu’s Kunlun. This space is going to get crowded very fast!
Then there is the current situation is that DL hardware is dominated by extremely large and capable companies: Nvidia and Google. Although you can’t get your hands on a physical Google TPU2, you get the next best thing. That is virtual access to it via cloud services. Nvidia by contrast provides availability both in the cloud (via 3rd party vendors) and through hardware.
The next closest vendor that can compete against these two behemoths is AMD. AMD supports the latest Tensorflow version and has a slowly maturing DL stack. Unlike Nvidia and Google, they lack a Tensor core component (i.e. Systolic Array), but for more conventional training and inference work loads, AMD GPU hardware is comparable in performance to Nvidia. This might not sound like a big deal, but it is miles ahead of any other startup competitor. AMD has the similar economies of scale as Nvidia in that a majority of their silicon is purchased for alternative use (i.e. Gaming, Rendering, HPC).
Honestly, there are too many wannabes in this space. A mature DL marketplace can possibly support no more than 3 competitors. Nvidia and Google have already solidified their positions, so there’s just one spot left! To survive companies must make it as easy and frictionless as possible to deploy solutions. Additionally, each company must be able to differentiate their offering. Not everyone should be doing image processing! To effectively offer a seamless experience and one that is specialized for a niche, these companies must invest in software (BTW, I’m looking for a hardware partner to explore niches).
2. Meta-Learning will be the new SGD
Meta-learning has yet to replace SGD, however Meta-learning in the form of neural architectural search has made gigantic progress in the field. The key developments relate to Hypernetwork search.
Deep Learning Architecture Search and the Adjacent Possible
What is the biggest problem with deep learning networks? I propose that it is: “learning how to forget”. Deep learning…
The other area where we find progress in meta-learning are in algorithms inspired by few-shot learning MAML. That has lead to variety of algorithms: Reptile, PROMP and CAML. Meta-learning approaches require extreme amounts of training data. That’s because it is learning over a variety of tasks rather than just a variety of samples. It has two iterative loops for learning, an outer learning loop iterating over tasks and an inner loop that iterates over training data. MAML based approaches only consider initialization of tasks, and therefore don’t require comparable the same kind of data.
I am beginning to think that unsupervised learning and meta-learning are actually the same problem. The way evolution solves this problem is through the development of stepping stone skills. What this implies is that it depends entirely on the kind of problem that is being solved. Is the unsupervised or meta-learning task for prediction, autonomous control or generative design? Each kind of problem requires different base skills and thus there’s some kind of basecamp that can be targeted rather than this fanciful notion of being able to bootstrap from nothing. Bootstrap magic methods are a fiction that appeals to researchers who believe in salvation through mathematical magic.
To conclude, the only two kinds of Meta-learning methods that seems promising in 2018 are evolution inspired architecture search and few shot learning MAML methods.
3. Generative Models drives a New Kind of Modeling
Generative models remain mostly confined to entertaining applications. Sure there’s a lot of nightmarish images that are created by BigGAN search, but unfortunately employment of high fidelity generative models to replace standard computational methods is still a work in progress.
DeepMind has been working for two years on protein folding, they’ve just announced their results in December:
AlphaFold: Using AI for scientific discovery | DeepMind
Our system, AlphaFold, which we have been working on for the past two years, builds on years of prior research in using…
DeepMind trained a generative neural network to invent new protein fragments, which were then used to continually improve the score of proposed protein structures. This is one of the more impressive uses of generative networks that goes beyond just aesthetic generation of images, 3d structures or sound.
For further explorations in this space, read:
The Delusion of Infinite Precision Numbers
Real numbers are not real. The argument is simple, real numbers cannot reflect reality (i.e. not real) because they…
4. Self-Play is Automated Knowledge Creation
Self-play methods introduced by AlphaGo has not penetrated very far in application use. However, in research, OpenAI’s Dota Five has shown to tackle a very hard AI problem (i.e. realtime strategy game). The main stumbling block preventing this use appears to be the difficulty in framing problems in this scenario and the many kinds of uncertainty that exists in real world problems.
Deep Reinforcement Learning (DRL) has taken a lot of flack in 2018, however Ilya Stuskever of OpenAI was so enamored with the Dota Five’s seemingly infinite DRL scalability that he’s predicting AGI early than most:
Why AGI is Achievable in Five Years
Ilya Stuskever of OpenAI is one of the most prominent advocates of the idea that ‘Brute force computation’ is all you…
Despite this apparent validation of RL scalability, there are newer papers questioning DRL robustness: https://arxiv.org/abs/1811.02553v2. I personally am inclined towards favoring intrinsic motivation methods than DRL methods. The reason being that most hard problems have sparse and deceptive rewards.
5. Intuition Machines will Bridge the Semantic Gap
When Yoshua Bengio starts explaining the limitations of Deep Learning using Dual Process theory, you do know that you are in the right track:
So this idea of Deep Learning machines being artificial intuition has gone a bit mainstream in 2018. Although, the idea Dual Process Theory is a good model for human thought, however we need to understand more fully the richness of intuitive thinking (System 1). Intuitive thinking is more than just thinking fast or more than just amortized inference. There’s a lot going on in the human brain that we have yet to replicate in DL:
Where is the Artificial Ingenuity in Deep Learning?
What is ingenuity and where does it originate from?
6. Explainability is Unachievable. We will just have to Fake It
Here’s the problem, collectively we aren’t getting any better at understanding the nature of DL networks. Research continues to throw in more monkey wrenches that break our theory of how DL networks are supposed to work. DeepMind threw one major monkey wrench:
Deep Learning’s Uncertainty Principle
DeepMind has a new paper where researchers have uncovered two “surpising findings”. The paper is described in…
this was followed by Adversarial Reprogramming of Neural Networks which demonstrated that you could use adversarial features to reprogram a neural network. Here they demonstrate hijacking a network trained in Imagenet to perform MNIST classification:
Our understanding of what goes on inside a neural network is woefully inadequate.
With regards to crafting intuitive explanations, I believe this still is a fertile field and requires more research. 2018 was a year that exposed a lot of Deep Learning limitations, but it wasn’t a year where we progressed much in developing better human interfaces:
Fake Intuitive Explanations in AI
Cassie Kozyrkov has just written a good take on why “Explainable AI won’t deliver”. Her take is the best survey of…
7. Deep Learning Research Information Deluge
The number of paper submissions in DL has doubled this year. To make matters worse, the quality of reviewers has plunged drastically. The signal to noise ratio has thus plunged to rock bottom and it’s now every man for himself with regards to sifting through the mountains of new research.
There are some tools like SemanticScholar, Arxiv Sanity and Brundage Bot that help you become aware of what’s out there. However, it’s just too easy for the really novel discoveries to just slip through the cracks.
8. Industrialization via Teaching Environments
No progress here in terms of industrialization.
2018 is the year when the problems with AI alignment came into clearer focus.
AI alignment is extremely important to AI teaching environments. Several teams have released their frameworks that will allow better reproducibility in Reinforcement Learning experiments (see: Facebook’s Horizon, OpenAI Baselines, DeepMind TRFL, Google Dopamine).
We’ve seen progress in transfer learning from virtual to real environments (see: OpenAI Learning Dexterity). We’ve seen progress in teaching complex skills with sparse rewards:
How to Bootstrap Complex Skills with Unknown Rewards
If you’ve ever wondered why many of the well publicized accomplishments in AI are difficult to translate into real…
9. Conversational Cognition
I’ve finally expressed a clearer roadmap of how to get to conversational cognition. Unfortunately, surmounting level three means item #5 in this list is accomplished. This is unlikely in the next few years, so it’s unlikely one gets to conversational cognition in perhaps several years. The only plus is that this idea of cognition has been identified. That is, it has become a known unknown.
To understand the necessary development stepping stones:
A New Capability Maturity Model for Deep Learning
How can we understand progress in Deep Learning without a map? I created one such map a couple years ago, but this map…
10. Ethical Use of Artificial Intelligence
Finally massive awareness of the problem! The most notable is California legislature requiring chatbots to disclose that they aren’t human. Microsoft president Brad Smith has recently written about “Facial Recognition: It’s Time for Action.” I’ve been banging the table about since late 2017:
High Time to Regulate Face Recognition A.I.
We’ve reached a tipping point where it is now high time that we start the conversation of regulating Face Recognition…
Let’s hope policy makers take this more seriously.
The French have panicked about AI and crafted their “AI for Humanity” plans:
Six Months Later, France has Formulated their Deep Learning Strategy
Six months ago, I wrote that “The West is Unaware of the Deep Learning Sputnik moment”. It turns out mathematician…
The implicit hope is that AI will align with democratic needs.
Many companies have drawn the line with regards to weaponized AI:
Drawing the Ethical Line on Weaponized Deep Learning Research
Good AI and Deep Learning researchers place a lot of passion into their work. Although they may rarely reflect on the…
We’ve seen Google step away from military work. However, Microsoft and Amazon seem to have no qualms about this.
The problem of course is that our economical system favors artificial personhood over real humans:
Artificial Personhood is the Root Cause Why A.I. is Dangerous to Society
When I began writing my book “The Deep Learning A.I. Playbook”, I had given very little thought about the dangers of…
Ultimately, for AI researchers, one has to decide if they want to spend their life selling sugar water to kids or to do something truly meaningful:
Is the Purpose of Artificial Intelligence to Sell Sugar Water?
Steve Jobs has a famous quote that he used to convince the then CEO of Pepsi to join Apple: “Do you want to sell sugar…
In summary, I’ve overshot in most of my 2018 predictions.
I will therefore have to dial down my expectations for 2019.
We are beginning to realize that there are major complexity problems with regards to the entire Machine Learning paradigm of specifying reward functions and optimizing based on these rewards. This only gets you systems that learn to game the reward function. Quite often, despite apparent progress, the underlying system has learned to cheat the tests. This is a meta-level problem and it’s not going to get fixed overnight. At best, we can make incremental progress in improving curriculum learning.
Fooled by the Ungameable Objective
“Ungameable” isn’t a word. I just made it up. It’s an adjective that describes a set of rules that doesn’t have any…
Stay tuned for 2019 Deep Learning predictions.