2018 Retrospective on 10 Deep Learning Predictions

Carlos E. Perez
Intuition Machine
Published in
10 min readDec 1, 2018
Photo by Robert Bye on Unsplash

We are very quickly nearing the end of 2018. As I have done in previous years, this is a retrospective of my prediction for Deep Learning in 2018. The purpose of this exercise is to get a measure of the rapid progress of Deep Learning. By examining my own expectations and that of reality (by the end of the year), I help calibrate my intuition on how rapid this field is developing.

My DL predictions for 2018 is reviewed below with commentary on its progress in the year:

  1. A Majority of Deep Learning Hardware Startups will Fail

Not many DL hardware companies have yet to *publicly* fail in 2018 (here’s one), yet all of them have yet to deliver product. What’s fascinating is that despite lack of product, these startups have been able to raise even more money!

My argument last year was that DL startups would fail to appropriately estimate the cost to deliver usable software for their potential customer bases. The DL stack keeps getting richer and more complex, and it will be problematic for startups to catchup.

The biggest failure to launch goes to Intel. Did they not have a lot of fanfare in NIPS 2017 about finally delivering silicon? Well, their Nervana derived-product is MIA in 2018. Let’s see if they can deliver “Spring Crest” by Spring 2019.

There are a whole bunch of startups claiming to have working silicon. Deep Learning AI is sexy and everyone wants to “invent” their own chip. A short example includes: GraphCore, Wave Computing, Groq, Habana, Bitmain, Cabricon, Esperanto ,Novumind, Gryfalcon, PFN, Hailo and Horizon. This doesn’t include traditional semiconductor manufacturers like Samsung, ARM, Xilinx, Qualcomm, and Huawei. In addition the traditional cloud vendors aren’t buying but inventing their own AI chips: Amazon’s Inferentia, Microsoft’s Brainwave, Facebook, Alibaba’s AliNPU and Baidu’s Kunlun. This space is going to get crowded very fast!

Then there is the current situation is that DL hardware is dominated by extremely large and capable companies: Nvidia and Google. Although you can’t get your hands on a physical Google TPU2, you get the next best thing. That is virtual access to it via cloud services. Nvidia by contrast provides availability both in the cloud (via 3rd party vendors) and through hardware.

The next closest vendor that can compete against these two behemoths is AMD. AMD supports the latest Tensorflow version and has a slowly maturing DL stack. Unlike Nvidia and Google, they lack a Tensor core component (i.e. Systolic Array), but for more conventional training and inference work loads, AMD GPU hardware is comparable in performance to Nvidia. This might not sound like a big deal, but it is miles ahead of any other startup competitor. AMD has the similar economies of scale as Nvidia in that a majority of their silicon is purchased for alternative use (i.e. Gaming, Rendering, HPC).

Honestly, there are too many wannabes in this space. A mature DL marketplace can possibly support no more than 3 competitors. Nvidia and Google have already solidified their positions, so there’s just one spot left! To survive companies must make it as easy and frictionless as possible to deploy solutions. Additionally, each company must be able to differentiate their offering. Not everyone should be doing image processing! To effectively offer a seamless experience and one that is specialized for a niche, these companies must invest in software (BTW, I’m looking for a hardware partner to explore niches).

2. Meta-Learning will be the new SGD

Meta-learning has yet to replace SGD, however Meta-learning in the form of neural architectural search has made gigantic progress in the field. The key developments relate to Hypernetwork search.

The other area where we find progress in meta-learning are in algorithms inspired by few-shot learning MAML. That has lead to variety of algorithms: Reptile, PROMP and CAML. Meta-learning approaches require extreme amounts of training data. That’s because it is learning over a variety of tasks rather than just a variety of samples. It has two iterative loops for learning, an outer learning loop iterating over tasks and an inner loop that iterates over training data. MAML based approaches only consider initialization of tasks, and therefore don’t require comparable the same kind of data.

I am beginning to think that unsupervised learning and meta-learning are actually the same problem. The way evolution solves this problem is through the development of stepping stone skills. What this implies is that it depends entirely on the kind of problem that is being solved. Is the unsupervised or meta-learning task for prediction, autonomous control or generative design? Each kind of problem requires different base skills and thus there’s some kind of basecamp that can be targeted rather than this fanciful notion of being able to bootstrap from nothing. Bootstrap magic methods are a fiction that appeals to researchers who believe in salvation through mathematical magic.

To conclude, the only two kinds of Meta-learning methods that seems promising in 2018 are evolution inspired architecture search and few shot learning MAML methods.

3. Generative Models drives a New Kind of Modeling

Generative models remain mostly confined to entertaining applications. Sure there’s a lot of nightmarish images that are created by BigGAN search, but unfortunately employment of high fidelity generative models to replace standard computational methods is still a work in progress.

DeepMind has been working for two years on protein folding, they’ve just announced their results in December:

DeepMind trained a generative neural network to invent new protein fragments, which were then used to continually improve the score of proposed protein structures. This is one of the more impressive uses of generative networks that goes beyond just aesthetic generation of images, 3d structures or sound.

For further explorations in this space, read:

4. Self-Play is Automated Knowledge Creation

Self-play methods introduced by AlphaGo has not penetrated very far in application use. However, in research, OpenAI’s Dota Five has shown to tackle a very hard AI problem (i.e. realtime strategy game). The main stumbling block preventing this use appears to be the difficulty in framing problems in this scenario and the many kinds of uncertainty that exists in real world problems.

Deep Reinforcement Learning (DRL) has taken a lot of flack in 2018, however Ilya Stuskever of OpenAI was so enamored with the Dota Five’s seemingly infinite DRL scalability that he’s predicting AGI early than most:

Despite this apparent validation of RL scalability, there are newer papers questioning DRL robustness: https://arxiv.org/abs/1811.02553v2. I personally am inclined towards favoring intrinsic motivation methods than DRL methods. The reason being that most hard problems have sparse and deceptive rewards.

5. Intuition Machines will Bridge the Semantic Gap

When Yoshua Bengio starts explaining the limitations of Deep Learning using Dual Process theory, you do know that you are in the right track:

Lecture in TechAIDE 2018 Montreal

So this idea of Deep Learning machines being artificial intuition has gone a bit mainstream in 2018. Although, the idea Dual Process Theory is a good model for human thought, however we need to understand more fully the richness of intuitive thinking (System 1). Intuitive thinking is more than just thinking fast or more than just amortized inference. There’s a lot going on in the human brain that we have yet to replicate in DL:

6. Explainability is Unachievable. We will just have to Fake It

Here’s the problem, collectively we aren’t getting any better at understanding the nature of DL networks. Research continues to throw in more monkey wrenches that break our theory of how DL networks are supposed to work. DeepMind threw one major monkey wrench:

this was followed by Adversarial Reprogramming of Neural Networks which demonstrated that you could use adversarial features to reprogram a neural network. Here they demonstrate hijacking a network trained in Imagenet to perform MNIST classification:

Our understanding of what goes on inside a neural network is woefully inadequate.

With regards to crafting intuitive explanations, I believe this still is a fertile field and requires more research. 2018 was a year that exposed a lot of Deep Learning limitations, but it wasn’t a year where we progressed much in developing better human interfaces:

7. Deep Learning Research Information Deluge

The number of paper submissions in DL has doubled this year. To make matters worse, the quality of reviewers has plunged drastically. The signal to noise ratio has thus plunged to rock bottom and it’s now every man for himself with regards to sifting through the mountains of new research.

There are some tools like SemanticScholar, Arxiv Sanity and Brundage Bot that help you become aware of what’s out there. However, it’s just too easy for the really novel discoveries to just slip through the cracks.

8. Industrialization via Teaching Environments

No progress here in terms of industrialization.

2018 is the year when the problems with AI alignment came into clearer focus.

AI alignment is extremely important to AI teaching environments. Several teams have released their frameworks that will allow better reproducibility in Reinforcement Learning experiments (see: Facebook’s Horizon, OpenAI Baselines, DeepMind TRFL, Google Dopamine).

We’ve seen progress in transfer learning from virtual to real environments (see: OpenAI Learning Dexterity). We’ve seen progress in teaching complex skills with sparse rewards:

9. Conversational Cognition

I’ve finally expressed a clearer roadmap of how to get to conversational cognition. Unfortunately, surmounting level three means item #5 in this list is accomplished. This is unlikely in the next few years, so it’s unlikely one gets to conversational cognition in perhaps several years. The only plus is that this idea of cognition has been identified. That is, it has become a known unknown.

To understand the necessary development stepping stones:

10. Ethical Use of Artificial Intelligence

Finally massive awareness of the problem! The most notable is California legislature requiring chatbots to disclose that they aren’t human. Microsoft president Brad Smith has recently written about “Facial Recognition: It’s Time for Action.” I’ve been banging the table about since late 2017:

Let’s hope policy makers take this more seriously.

The French have panicked about AI and crafted their “AI for Humanity” plans:

The implicit hope is that AI will align with democratic needs.

Many companies have drawn the line with regards to weaponized AI:

We’ve seen Google step away from military work. However, Microsoft and Amazon seem to have no qualms about this.

The problem of course is that our economical system favors artificial personhood over real humans:

Ultimately, for AI researchers, one has to decide if they want to spend their life selling sugar water to kids or to do something truly meaningful:

Summary

In summary, I’ve overshot in most of my 2018 predictions.

I will therefore have to dial down my expectations for 2019.

We are beginning to realize that there are major complexity problems with regards to the entire Machine Learning paradigm of specifying reward functions and optimizing based on these rewards. This only gets you systems that learn to game the reward function. Quite often, despite apparent progress, the underlying system has learned to cheat the tests. This is a meta-level problem and it’s not going to get fixed overnight. At best, we can make incremental progress in improving curriculum learning.

Stay tuned for 2019 Deep Learning predictions.

Further Reading

Explore Deep Learning: Artificial Intuition: The Improbable Deep Learning Revolution
Exploit Deep Learning: The Deep Learning AI Playbook

--

--