Bayesian Meta-Learning Is All You Need

James Le
Data Notes

--

Update: This post is part of a blog series on Meta-Learning that I’m working on. Check out part 1, part 2, and part 3.

In my previous post, “Meta-Learning Is All You Need,” I discussed the motivation for the meta-learning paradigm, explained the mathematical underpinning, and reviewed the three approaches to design a meta-learning algorithm (namely, black-box, optimization-based, and non-parametric).

I also mentioned in the post that there are two views of the meta-learning problem: a deterministic view and a probabilistic view, according to Chelsea Finn.

  • The deterministic view is straightforward: we take as input a training data set Dᵗʳ, a test data point, and the meta-parameters θ to produce the label corresponding to that test input.
  • The probabilistic view incorporates Bayesian inference: we perform a maximum likelihood inference over the task-specific parameters ϕᵢ — assuming that we have the training dataset Dᵢᵗʳ and a set of meta-parameters θ.

This blog post is my attempt to demystify the probabilistic view and answer these key questions:

  1. Why is the deterministic view of meta-learning not sufficient?
  2. What is the variational inference?
  3. How can we design neural-based Bayesian meta-learning algorithms?

Note: The content of this post is primarily based on CS330’s lecture 5 on Bayesian meta-learning. It is accessible to the public.

--

--