Fast, careful adaptation with Bayesian MAML

Ousmane Dia
Element AI Lab
Published in
5 min readDec 4, 2018

Effective AI agents meant to work and learn in the real world should be quick to adapt to changes in their operational conditions. Where standard machine learning algorithms are designed to train once and then do one thing over and over, AI agents like self-driving cars, service robots, and virtual assistants face complex, dynamic environments. Our Bayesian Model-Agnostic Meta Learning method integrates the latest in fast adaptation techniques with a Bayesian approach to uncertainty, an important step towards developing reliable AI systems that can master new challenges and adapt to ever-changing real-world situations.

Interacting with the real world, for an AI system as well as for humans, is in part the work of managing anomalies, responding to new needs, and fitting special circumstances. Instead of trying to learn everything ahead of time, real-world AI systems have to learn how to do their learning on the go, adapting to new situations as they come.

Recent research successes in the area of fast adaptation, also known as few-shot learning, go a long way towards making practical adaptive agents plausible. While research in fast adaptation is officially concerned with any kind of learning on small, few-shot datasets, an ideal fast adaptation method should allow an agent to hit the ground running in an unfamiliar situation, leveraging incoming data quickly to make useful, safe decisions while continuing to learn.

To make this possible, fast adaptation methods need to be data-efficient, computationally fast, and precise in handling uncertainty. Cutting edge gradient-based meta-training methods like Model-Agnostic Meta Learning recently achieved part of this goal: MAML is an efficient method for pre-training neural networks to do very fast, very data-efficient learning in a field of our choice. Unfortunately, even though methods like MAML are a promising foundation for real-world adaptive learning, fundamental problems with how they address uncertainty can make them unreliable for serious applications. With our new BMAML method, we begin to solve this problem while preserving the speed and efficiency of MAML.

The central problem with methods like MAML is that they produce a single educated guess about the model underlying a few-shot dataset. Even with the full benefit of an agent’s experience, or meta-training, in a field, a rational agent first engaging a new problem will be very uncertain between different models of the data. When uncertainty between models is high and the models are complex, critical parts of useful, safe decision making can depend on the exact details of this uncertainty.

Our BMAML method, which combines fast adaptation and a Bayesian Neural Network, is the first fast adaptation method that can capture the complex uncertainty of few-shot datasets. In our experiments, we show that by accounting for uncertainty BMAML improves over MAML on all standard few-shot learning benchmarks, with strong gains in active learning and in exploration. Still, the full importance of uncertainty goes beyond what these benchmarks can express: Correct management of uncertainty is also critical in order for fast adaptation to be safe

Chaser loss using Stein Variational Gradient Descent.

One reason that fast adaptation with precise uncertainty is hard is that precise uncertainty is hard in general, especially for neural networks. Methods like MAML are based on the standard form of neural network learning — gradient descent on an empirical loss — and inherit all its problems with uncertainty. As Bayesians like to remind us, gradient descent on the empirical loss of a neural network is at best equivalent to choosing the best model in a family of models given by the neural network’s architecture.

Instead of distributing our credence over models in the family in proportion to their strength, we’re putting all of our confidence in one top model. While committing to a single model in this way can be acceptable within traditional Big Data settings, the complex uncertainty in few-shot learning is much less forgiving: In those settings, the large number of observations can force all plausible models to be practically similar. In few-shot learning, by contrast, the posterior over neural network models may well be a complex, multimodal distribution whose fine structure strongly influences the marginal likelihood of unseen data.

The general solution for deep learning under high uncertainty is to learn a Bayesian distribution over neural network models, known as a Bayesian Neural Network. Unfortunately, robust Bayesian Neural Networks can be very slow to train, which makes them a bad fit for agents on the go. This is especially unfortunate since the unique ability of BNNs to capture complex structures of uncertainty is just what agents learning on the go need most.

The central trick of our BMAML method is to use a variation on the MAML algorithm to accelerate a robust Bayesian Neural Network. We begin by noticing that an important recent algorithm for training a BNN, Stein Variational Gradient Descent, is theoretically compatible with gradient-based meta-training. By using a new meta-loss we call the chaser loss, BMAML applies gradient-based meta-training to directly optimize a BNN for fast descent towards the true posterior of a dataset.

We hope BMAML’s fast, precise few-shot inference can be an important step toward the goal of practical fast adaptation. At the moment, BMAML is still computationally demanding and requires ad-hoc tricks to reduce resource usage when working with datasets like ImageNet. We’re also still looking to fully understand how Stein Variational Gradient Descent scales, and whether we could meta-train additional hyperparemeters to optimize it further. In our future work, we’d like to properly streamline BMAML for the kind of rich, demanding datasets adaptive agents face out in the real world.

For now, BMAML is a proof of concept that fast adaptation really can have it all: it can be computationally fast, data efficient, and precise in handling uncertainty, all at the same time.

This article is part of a series on Element AI papers presented in NeurIPS 2018.Click here for a full list of papers and our NeurIPS schedule.

Written by Peli Greitzer, with technical oversight by Sungjin Ahn and Ousmane Dia, and edited by Peter Henderson. Visual design by Manon Gruaz.

--

--