CS-330 — Meta-Learning Lecture 4 Notes

Non-Parametric Meta Learners
  • Why can initialization of parameters be considered a prior over the parameters ?
    ANS :-
  • [ Santos ’96 ] shows that Gradient descent with early stopping == MAP inference(that maximizes likelihood of training data) under Gaussian prior(over parameter space) with mean at initialization point.**
  • So, MAML training== finding MAP estimate of ϕ based on current θ and task followed by calculating loss on test data of the task. [Like VAE, there is inference(encoding), followed by evaluation(how good is reconstruction?)] Even equations are similar :-
The Probabilistic Graphical Model of meta-learning problem along with maximum likelihood objective
Other ways in which prior can be established on the task specific parameters ϕ. Learning this prior will then be equivalent to above formulation of meta learning problem
  • [Kim et. al. Auto-Meta] Uses Neural Architecture search to find network that is good for inner optimization.
Ideas to solve problems that come with nested optimization procedure of optimization based meta learning
Trying to surpass inner-gradients fully.
i-MAML : Better approximation for exactly differentiating through optimization path of inner loop
  • The basic concept of non-parametric approaches is to use non-parametric approaches for classification in the inner loop of optimization based approaches, to avoid training using SGD in inner loop. The additional benefit is non-parametric approaches perform better on low data regimes, so they are suited here too.
  • We use parametric meta-learners that produce good non-parametric learners.
  • Non-Parametric methods :-
    1.) Siamese Networks :- Take 2 images as input and try to predict binary label which indicates whether they belong to same class.
    2.) Matching Networks :- They were introduced to resolve the discrepancy between training and testing procedure of Siamese Networks. Their structure is as follows :-
Matching networks compare the embeddings of dog images on left and the embedding of dog image at bottom to assign probability of the bottom dog belonging to each of the species on left.

3.) Prototypical Networks :- If more than 1-shot training is there, all images are sent together into g_{θ} . And the embeddings of images corresponding to one class are averaged over to obtain a “prototypical embedding” of each class. Like follows :-

4.) Other Ideas for non-parametric learning:-

Comparison Of The Three Approaches

Come join DA Labs on our quest to understand the workings of machines and how they learn ! Wonder and wander in the beautiful field , that is Deep Learning ! Any kind of feed-back/questions are always welcome & appreciated 😇 To join us in our projects/research, interact with us, or write with us, just chime in on our discord server https://discord.gg/UwFdGVN ! :)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jeevesh Juneja

Jeevesh Juneja

Searching 🧐 for the forgotten and lost truths