A Basic Introduction to Few-Shot Learning

Rabia Miray Kurt
The Startup
Published in
5 min readJun 26, 2020

Introduction

Machine learning has been affected by substantial changes over the years. At the very basis of machine learning and deep learning principles, data always take a superior place. But sometimes adequate data acquisition doesn’t provide more accurate models. In this situation, few-shot learning can serve some benefits to increase the accuracy of models and help businesses. The basic idea of few-shot learning is making predictions on minimalist datasets with reliable algorithms. As mentioned before, it facilitates solving data amount problems and reducing expenses. It aims to avoid overfitting and underfitting. Overfitting is the case where the model is learning “too much” from the training data set, so the generalization of the model becomes unreliable. Underfitting is the case where the model has “ not learned enough” from the training data, it may not even gather the dominant trend.

For any form of low-shot learning, there are some requirements that we should at least obtain one of them. They are previously trained filters & pre-determined architecture, a correct assumption of data distribution, a specific classification for information.

Few-shot learning is one of the most effective techniques for experimenting with low-data. Techniques such as regularization could intercept overfitting but it doesn’t find a solution to the main problem caused by fewer training examples. Yet it raises some difficulties like data gathering & labeling, hardware constraints, result analysis, etc. There are some significant fields of usage of few-shot learning in daily life. As an example, when there is a limited amount of data; we can use few-shot learning working with rare diseases in medicine. It can also help to relieve the burden of collecting large-scale supervised date for industrial usages.

Low-Shot Learning Approaches

There are two main types of low-shot learning approaches. They are “parameter-level” and “data-level” approaches.

1. Parameter-level Approach

The parameter-level approach limits the parametric space to prevent overfitting. The learning is carried out by using an algorithm to adapt the parameters in a mathematical or statistical model given training data. These can be logistic regression, neural networks, etc. An instructive algorithm can be trained using a big quantity of data on how to minimize the parameter space. Thereafter, if the real classifier is trained, the instructive algorithm leads the student on the overall parameter to put the best training results into practice.

Non-parametric method: Learning is accomplished by storing the training data (memorization) and performing some dimensionality reduction mappings, for example, k-nearest neighbor (kNN)* and decision trees.

The kNN classifier is a non-parametric classifier that simply stores the training data, D, and classification is done by a majority vote to its neighbors, computed using new data points based on similarity measures (e.g. distance function). For a kNN, we need to choose the distance function, d, and the number of neighbors, k:

Deduction: In the case of having a large number of datasets, it’s best to use a parametric approach. Thus, a nonparametric approach is more compatible with few-shot learning because it generally requires storing data and processing it for every query. In this way, non-parametric methods are more functional in generalization and training.

2. Data-level Approach

To avoid overfitting & underfitting, data-level approaches use the simple way of adding more data. But it doesn’t mean that the solution is gathering more data, it directs to an extensive collection of external data sources. Another data augmentation technique is producing new data, it can be employed to add random noise to the target training model. An alternative technique called Generative Adversarial Networks (GANs) proposes generative models that learn to generate new data instances with the same statistics as the training set.

Figure1: High-Level GAN Architecture in MNIST Generative Adversarial Model in Keras

Meta-Learning

Automatic learning algorithms are applied to metadata. After a few samples, the model learns “how to learn”. As Thrun & Pratt stated in 1998, an algorithm is learning to learn if “its performance at each task improves with experience and with the number of tasks”.

Figure 2: An Example of Meta-Learning with Matching Networks Architecture

In the figure above, the feature extractor is different for support set images (left) and query images (bottom). Matching Networks is the first metric learning algorithm using meta-learning. In this method, the network finds the most appropriate embedding not only according to the image to embed, but also all other images in the support set. There are many other algorithms using meta-learning.

Zero-Shot Learning

In zero-shot learning, there is a procedure that performs the learning with an object’s features rather than direct data usage. So it recognizes objects in an image without any labeled training data to help in the classification.

Figure 3: A chart demonstrating Training and Zero-Shot Learning Features

References

  • Jadon, Shruti. Garg, Ankush. “Hands-On One-shot Learning with Python”. Packt Publishing, April 2020.
  • Ravichandiran, Sudharsan. Hands-On Meta Learning with Python”. Packt Publishing, December 2018.
  • Google Developers. “Generative Adversarial Networks”. https://developers.google.com/machine-learning/gan
  • Wang, Yaqing. Yao, Quanming. “Few-shot Learning: A Survey”. Cornell University arXiv, arXiv:1904.05046v1 [cs.LG], April 2019.
  • Garbade, Michael Jurgen, Dr. “Understanding Few-Shot Learning in Machine Learning”. Medium,

https://medium.com/quick-code/understanding-few-shot-learning-in-machine-learning-bede251a0f67 , August 2018.

  • O’Shea, Tim. “MNIST Generative Adversarial Model in Keras”.

https://www.kdnuggets.com/2016/07/mnist-generative-adversarial-model-keras.html , July 2016.

  • Vinyals, Oriol. Blundell, Charles. Lillicrap, Timothy. Kavukcuoglu, Koray. Wierstra, Daan. “Matching Networks for One Shot Learning”. Neural Information Processing Systems Proceedings, June 2016.

--

--