Deep Few-Shot Learning

Frederik Pahde, Tassilo Klein and Moin Nabi (ML Research Berlin)

Tassilo Klein
SAP AI Research
5 min readNov 15, 2017

--

In recent years, deep learning techniques have achieved remarkable results in computer vision, constantly pushing the boundaries of what is possible. These advances can be explained by improvements to algorithms and model architecture along with increasing computational power and growing availability of big data. However, the big data assumption for training, which is key for deep learning applications, is not always realistic. Particularly, in enterprise or healthcare scenarios, labelling samples is often very expensive or even impossible. In order to build powerful models in these problematic situations, few-shot learning algorithms have been developed and prove to be a promising tool in small data scenarios.

For the moment, let’s use a simple example: the categorization of birds from photos. While we expect that common bird species will have training images from different views readily available, there will be rare species for which only a few pictures exist. If we plan to develop a classifier for bird images, given the imbalance in the training set, we end up with a few-shot learning scenario. In extreme cases, where only a single image is available, we would call this a one-shot learning problem.

Coming back to few-shot learning, the challenge is that limited observations result in hard shifts in the behavior of the model that cannot be easily and smoothly extended for new classes. The reason for this difficulty is that for deep learning these hard shifts occur in a generally huge parameter space. However, for successful training, a good balance between the parameter set and dataset size is typically necessary. Employing standard optimization techniques in a few-shot scenario will have the undesirable tendency to extremely overfit the data. In order to avoid this trap, the model has to be forced to generalize well beyond the few available training instances, which is far from straightforward and requires a sophisticated strategy.

Problem Setting for Few-shot Learning

Approaches and Trends

In general, there are two main concepts to tackle few-shot learning: Data-level and parameter-level approaches.

Data-level Approach

The data-level approach is straightforward and intuitive. If there are not enough instances available to fit the parameters of the model without underfitting or overfitting the data, more instances are necessary. One way to achieve this is to tap into the large pool of external data from various sources (Douze et al., 2017). Simply put, if the objective is to build a classifier for different species of birds with only a few labeled objects per category, it might be helpful to exploit other data sources that contain images of birds. In this situation, even unlabeled instances can be helpful, as they can be used to learn the high-level concept of birds in general. Additionally, unlabeled images can be incorporated in a semi-supervised way. This requires employing a distance metric in order to find appropriate unlabeled images, which feature similarity to the labeled images. Afterwards, label propagation can be performed and the unlabeled images can be inserted into the training set, ultimately enlarging the corpus.

Besides the exploitation of external data sources, another approach for few-shot learning on data-level is to generate new data. A first easy step can be to perform data augmentation, which is a common technique employed in the computer vision domain. This technique entails certain operations such as rotations or adding of random noise to the image content.

An alternative to this approach is a relatively recent technology called generative adversarial networks (GANs) for image generation, which allows for modeling even higher complexity. Let’s go back to the example of birds to make it more descriptive. If our example data set contains only front-facing images of extinct bird species, then a GAN may be used to generate completely new instances of the same bird showing it from different perspectives without ever having seen it. This astonishing transfer capability is rendered possible, if and only if, the network was sufficiently exposed to multi-view images from other bird species (Mehrotra et al., 2017), hence capturing enough variation.

Parameter-level Approach

This contrasts with the parameter-level approach tackling few-shot learning by facing the problem of a high-dimensional parameter space that is too large given the small amount of training data. In order to avoid overfitting, the parameter space can be constrained. This is a standard strategy in machine learning and solved via regularization or the smart choice of a loss function. However, the regularization term or loss function can be adapted to the few-shot learning setting (Yoo et al., 2017), such that the model is forced to generalize well, even for a small number of training samples. Another strategy is to improve the model by guiding the optimization algorithm in the large parameter space. Standard optimization algorithms, like stochastic gradient descent (SGD), need many iterations until convergence, which would not work well in a high dimensional parameter space given a small amount of training samples. In this case it is logical to teach the algorithm to chose a more intelligent way in the parameter space for faster convergence. This strategy is generally known as meta-learning. Following this notion, an option is to train a teacher model on a large amount of data, which learns to capture the parameter space. Next when the actual classifier (pupil) is trained, the teacher model guides the pupil on the parameter manifold during the training phase in order to achieve good results, as was shown by researchers from Twitter (Ravi & Larochelle, 2017).

Combining the Best of Both: Hybrid Approaches

Although many more approaches to tackle the described few shot learning problem exist, the most prominent concepts use the data-level or parameter-level. On top of this, some work on hybrid approaches can be found which combine both concepts, such as the work from Facebook (Hariharan & Girshick, 2017). These new approaches provide a clear advantage, as both perspectives can be leveraged to fix the imbalance between parameter space and dataset size. Our SAP machine learning research team therefore plans to build on the idea of hybrid approaches while exploring new few-shot learning algorithms.

Frederik Pahde, a M.Sc. student at Humboldt University Berlin will cover the challenges discussed in this post in his thesis.

--

--