“Predictive Learning” is the New Buzzword in Deep Learning
Yann LeCun in his many talks this year has repeatedly hammered away at this analogy:
If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake.
LeCun at NIPS 2016, has now started using the phrase “predictive learning” in substitution of “unsupervised learning”. LeCun says:
A key element we are missing is predictive (or unsupervised) learning: the ability of a machine to model the environment, predict possible futures and understand how the world works by observing it and acting in it.
This is an interesting change and indicates a subtle change in his perspective as to what he believes is required to build up the “cake”. In LeCun’s view, the foundation needs to be built before we can make accelerated progress in AI. In other words, building off current supervised learning by adding more capabilities like memory, knowledge bases and cooperating agents will be a slog until we are all able to build that “predictive foundational layer” (see: “Five Capability Levels of Deep Learning” ). At the conference he posted this slide:
Which emphasizes the formidable task that is ahead of us. Predictive learning is clearly requires machines to be able to not just learn without human supervision but learn a predictive model of the world. It is very important to emphasize this point and why LeCun is attempting to change our perspective of the canonical taxonomy of AI ( i.e. unsupervised, supervised and reinforcement learning).
Ruslan Salakhudinov, recently hired by Apple to lead their AI research, has a good survey talk of Unsupervised Learning (now to be rechristened as Predictive Learning) where he provides this taxonomy:
At the right corner of the slide he mentions Generative Adversarial Networks (GANs). GANs consists of competing neural networks, a generator and discriminator, the former tries to generate fake images while the later tries to identify real images. The interesting feature of these systems is that a closed form loss function is not required. In fact, some systems have the surprising capability of discovering its own loss function! A disadvantage of adversarial networks are they are difficult to train. Adversarial learning consists in finding a Nash equilibrium to a two-player non-cooperative game. Yann Lecun, in a recent lecture on unsupervised learning, calls adversarial networks the “the coolest idea in machine learning in the last twenty years”.
Elon Musk’s OpenAI research has a big focus on Generative Models, their motivation can be summarized by Richard Feynman’s quote “What I cannot create, I do not understand.” Feynman is here alluding to his “First Principles” method of thought were he needs to be able to build up understanding by composing proven concepts. The basic idea here is that perhaps if a machine is able to generate models with high realism then perhaps ( a big leap here ) it develops an understanding of the predictive model. Here are some images of the state-of-the-art of this technique:
These are images generated by the DL system given the word provided. This indeed in quite impressive. I wouldn’t expect many humans to be able to draw this well! Now, this system is not perfect, as evidence by this failure set:
But, hey, I’ve seen many people do much worse while playing Pictionary!
The current consensus though are that these generative models aren’t able to capture the semantics of the the task. They don’t understand the meaning of an ant, volcano or redshank. They are however very good a mimicry and in fact prediction. These images are not recreations of images that the machine was previously trained on. Rather, the machine has come up with some generalized model that allows it to extrapolate a very realistic result.
This approach of using adversarial networks is different from the more classical approach of machine learning. Here we have two competing neural networks (i.e. discriminator and generator) that seem to work synergistically to accomplish a kind of generalization (see: “Rethinking Generalization”). In the classical ML world one would define a objective function that one would fire up one’s favorite optimization algorithm. However, in this research area, the correct objective function is quite unclear. Even more surprising is that these systems are able to learn their own objective function!
The fascinating realization here is that DL systems are extremely malleable. The classic ML notions that the objective function and constraints are fixed or the notion that the optimization algorithm is fixed do not apply in DL. Even more surprising is that a meta-level approach can be used. That is, DL systems can learn how to learn (see: “Meta-learning”).