Starting off with One-Shot Learning

understanding one-shot learning basics

Mehul Gupta

Published in

Data Science in your pocket

4 min readNov 16, 2022

https://nintendosoup.com/oneshot-is-heading-to-nintendo-switch-in-2022/

Real world is a tough place for a Data Scientist.

You just don’t have enough data

Especially when you wish to try out some Neural Network based approach, this problem magnifies as Neural Networks need ample amount of data to give decent results else will overfit very quickly.

So, what should we do?

One-Shot Learning is the answer.

Many of you must have heard of this jargon in recent times.

But what actually is One-Shot Learning?

N-Shot Learning is a family of methods/algorithms/networks where the models learns from just N examples where N is very small number. The different variants we can have are:

Zero-Shot Learning: The model is trained in such a way that it is able to classify unseen classes in test dataset (following the concept of transfer learning where the model has some prior information about the environment). So it might be the case in the training dataset we have 5 classes and the test has some 6th class coming as well. Hence, Zero samples for some classes and hence called Zero-Shot Learning
One-Shot Learning: The model is trained on 1 sample each from every class
Two-Shot Learning: The model is trained on 2 samples each from every class
Few Shot Learning: The model is trained on N(very low) samples from each class

Note: The concept of One-Shot and Few-Shot learning have been used interchangeably over the internet as One-Shot is nothing but Few-Shot with N=1. Hence, we might be using few samples, and still call the approach One-Shot or vice-versa. So, assume One-Shot or Few-Shot as same for now

But why the name One-Shot?

This has something to do with how a human learns. The idea is that as a human is able to remember important features of an object (say you saw a Giraffe once in a zoo or in your school textbook), he/she is able to recognize it the next time. Hence, just one-shot of the object is enough for a human memory to identify the object as Giraffe. Similarly, we wish to develop architectures/models such that they can replicate human memory i.e. recognize object just after one/few examples.

Moving on, what if I tell you already know a few One-Shot Learning algorithms?

So, in machine learning, we have basically 2 types of algorithms based on the parameters

Parametric: These models need to learn some sort of parameters/weights while training like Neural Networks or Logistic Regression.
Non-Parametric: These models don’t need to learn any parameters/weights while training. Examples: KNN (intakes a hyperparameter N) or Decision Trees (the thresholds set at different nodes in the tree are estimated using statistical methods and not learnt !).

Models that need to learn parameters/weights require large training data compared to the ones that are non-parametric as the weights are to be estimated and not to be calculated using some statistical method. Hence, Non-Parametric methods/algorithms like KNN or Decision Trees can be considered for One-Shot/Few-Shot learning.

And mark my words, they will perform better than many complex models if you have real scanty data !!

But what if I wish to use Neural Networks or other Parametric ML algorithms, or the dataset is complex like Text or Images? You would wish to use deep learning architecture for such problems. But again, scanty data is not meant for Neural Networks. How can we apply One-Shot Learning methods to deep learning architectures? Or do we have certain specialized deep learning architecures that can support One-Shot Learning?

There are multiple ways in which we can do this

Use some specialized loss functions so as to make the model learn distinct features of object faster like Contrastive loss function or Triplet loss (assume them to be black-box, will discuss in later posts). Example: Siamese Networks
Use of explicit memory/information. Ex: Neural Turing Machines, Memory Augmented NN
As methods like gradient decent require a big corpus to converge while optimizing, different optimization methods are introduced that can help in converging faster. Example: Model-agnostic meta learning.
Transfer learning based methods.

You must have heard the names of most of the algos/methods mentioned for the first time ever!! No need to worry, even I heard them for the first time while writing this down but we won’t be going further down the neck to make things complex. Rather, we will be covering them one by one in my next !

A big thanks to Hands-On with One-Shot Learning