Few-shot learning is the ability of an algorithm to learn a task by observing only a few samples. This is a popular tool in image classification, where there may not be significant amounts of images on which to train a model.
Our project, Task Dependent Adaptive Metric for improved few-shot learning, improves on a few-shot learning method known as prototypical networks. It maps individual classes into prototypes, allowing classification to be performed using nearest neighbour methods. We developed a task-conditioning module to facilitate the encoding of tasks, making the few-shot model readily adaptable to a wider range of problems as well as allowing for a more refined few-shot learning.
With this new few-shot architecture, task conditioning and metric scaling, we set a new standard on the mini-ImageNet and Few-shot CIFAR100 datasets. In addition, we empirically tested and demonstrate that adding an adaptive scaling factor to the similarity metrics can ameliorate the problems associated with running high-magnitude or low-magnitude values through the softmax function.
In this study, we developed a two-step process using prototypical networks to learn an image classification from one, five or 10 samples. The first step is a dynamic task-conditioning module which uses prototypical networks to form a task representation from the sample set. For example, tasks could be classifying animal families, subset of digits, or objects. This conditioning module also includes a task embedding network to define the layer-wise task coefficients for the feature encoding module of the few-shot task. One key advantage of TADAM is that the same network weights, or neuron connections, are shared throughout the task, increasing efficiency and allowing the model to adapt on the fly to a wide variety of problems.
Image classification occurs in the second step of this few-shot model, in which sample images are classified using prototypical networks to extract class representations. The comparison between query and sample image classification is made via the similarity metrics module, which measures how distant each representation is from the others.
As an additional contribution to metric learning, we tested whether the presence of a scaling factor to adjust the output magnitude of the similarity metric helped the softmax function. We found that it does, regardless of metrics used — cosine or Euclidean distance.
The similarity metric, or distance function, gives a measure of separation between clusters, which in this case represent classes. The softmax function then normalizes this distance measure and converts it into probabilities of belonging to a particular class. It does not work particularly well with extremely low- or high-magnitude data, and in this case the normalization process involves using an exponential function with the distance measure. This scaling factor can be adapted to the type of distance function used, thus making it an adaptive scaling factor that converts the distance measure to a range of values easier for the softmax to normalize. The end result is an increased classification accuracy on mini-ImageNet.
Our TADAM project sets the new state-or-the-art for mini-ImageNet classification, demonstrates that similarity metrics and softmax should be used with metric scaling and shows that few-shot learning with prototypical networks can perform well with heterogeneous datasets, even with drastically different classes (e.g. planes and chinchillas).
TADAM is the work of Boris N. Oreshkin (Applied Research Scientist), with help from Pau Rodriguez (Future Research Scientist at EAI) and Alexandre Lacoste (Research Scientist).
This article is part of a series on Element AI papers presented in NeurIPS 2018. Click here for a full list of papers and our NeurIPS schedule.
Blog post written by Pau Rodriguez and edited by Peter Henderson, with visual design by Manon Gruaz.