Bayesian Meta-Learning Is All You Need
Update: This post is part of a blog series on Meta-Learning that I’m working on. Check out part 1, part 2, and part 3.
In my previous post, “Meta-Learning Is All You Need,” I discussed the motivation for the meta-learning paradigm, explained the mathematical underpinning, and reviewed the three approaches to design a meta-learning algorithm (namely, black-box, optimization-based, and non-parametric).
I also mentioned in the post that there are two views of the meta-learning problem: a deterministic view and a probabilistic view, according to Chelsea Finn.
- The deterministic view is straightforward: we take as input a training data set Dᵗʳ, a test data point, and the meta-parameters θ to produce the label corresponding to that test input.
- The probabilistic view incorporates Bayesian inference: we perform a maximum likelihood inference over the task-specific parameters ϕᵢ — assuming that we have the training dataset Dᵢᵗʳ and a set of meta-parameters θ.
This blog post is my attempt to demystify the probabilistic view and answer these key questions:
- Why is the deterministic view of meta-learning not sufficient?
- What is the variational inference?
- How can we design neural-based Bayesian meta-learning algorithms?
Note: The content of this post is primarily based on CS330’s lecture 5 on Bayesian meta-learning. It is accessible to the public.