Deep Learning Architectures That You Can Use with a Few Data

Gorkem Polat
The Startup
Published in
7 min readJun 26, 2020

Conventional CNNs (AlexNet, VGG, GoogLeNet, ResNet, DenseNet …) have good performances when there are many samples for each class in the dataset. Unfortunately, they generally do not work well when you have a small dataset. However, there are many real-life scenarios where it is challenging to gather data for your classes. For example, in face identification systems, there are generally very few images for each person or in the medical domain, there are limited cases for some rare diseases.

So, what does deep learning offer when you have only five samples for your classes, or even one sample for each class? This issue is investigated under the term of few-shot learning. It is an active research area and there are many successful methods that can be adapted. In this article, I will only mention some of the most promising architectures.

This article will not explain the architectures in depth because it makes the post very long. Instead, I will only give the architectures’ main idea so that anyone who wants to deal with small data can get a general idea of the models.

Siamese Neural Networks

Architecture of Siamese Neural Networks

Siamese Neural Networks [1] take two samples as input and outputs a probability (or loss) for whether given inputs belong to the same class or not. Input samples pass through identical networks (with shared weights) and their embeddings are compared in the cost function (generally a metric based on the difference of the embeddings is used). During the training, “Networks” learn to encode inputs in a more robust way. First, the model is trained on a support set (verification step) to learn the same/different pairs. Then, the test sample is compared with each sample in the training set to get how similar it is to each class (one-shot task) based on learned encodings. It is one of the first successful models in a few-shot learning domain and became the basis for other models.

Steps of the Siamese Neural Networks [1]

Triplet Network and Triplet Loss

Triplet Networks

Triplet Network [2] is an extension to the Siamese NNs. Instead of using two samples, Triplet Network uses three samples as input: positive, anchor, and negative samples. Positive and anchor samples are from the same class and negative sample is from a different class. Triplet loss is arranged so that the embedding of the anchor is close to the positive and away from the negative. In this way, networks become more robust to extract embeddings. Triplet Networks have been used in face identification datasets and showed very high performance [3].

Triplet Loss

Matching Networks

Matching Networks [4]

Matching Networks [4] combine embedding and classification to form an end-to-end differentiable nearest neighbors classifier. Prediction of the model, , is the weighted sum of the labels, yᵢ, of the training set. The weights are pairwise similarity function, a(𝑥̂, xᵢ), between the query (test) example and support (training) set samples. The key point in the matching networks is the differentiability of the similarity function.

Where c represents the cosine similarity function, k is the total number of samples in the training set, and function f and g are the embedding functions for the test and training set. Overall, the similarity is calculated between the embedding of the test sample 𝑥̂ and embedding of samples xᵢ in the training set. The main novelty in this work is that embedding functions are optimized so that they give the maximum accuracy for the classification.

Prototypical Networks

Prototypical Networks [5]

Instead of comparing the test sample with all the training samples, Prototypical Networks [5] only compares the test sample with the class prototype (or mean class embedding). The key assumption is that there exists an embedding for each class where samples cluster around a single prototypical representation, cₖ. In their paper, it is showed to have superior performance to Matching networks.

Meta-Learning

Model Agnostic Meta-Learning [6]

Meta-learning means learning-to-learn. Meta-learning tries to train the model’s parameter so that it has maximum performance on a new task through one or more gradient steps (like humans do). The model’s parameters are updated according to the post-updated task-specific parameters so that after a single step for any task, it has the highest performance.

The aim of the Model agnostic meta-learning (MAML) is to learn a generic model that can be easily fine-tuned for many tasks with a few iteration steps. For each task in a meta-batch, a model using the weights of the base model is initialized. Stochastic gradient descent (SGD) is used to update the weights of the specific task. Then, the weights of the meta-learner are updated using the sum of losses from the post-update weights. The aim here is that on average for several different tasks, the loss will be small for these parameters.

Algorithm of the Model-Agnostic Meta-Learning

Bonus: MetaFGNet

MetaFGNet [7]

MetaFGNet [7] uses auxiliary data to train a network in addition to the target task network. These two networks share the initial layers (Base network) to learn general information. This method is also named as multi-task learning. Training the auxiliary data (S) together with target data (T) makes a regularization effect on the target training. MetaFGNet also uses a process called sample selection. Samples in the auxiliary data pass through the network and a score is given to the similarity of target classifier, and source classifier is calculated. If the similarity is high, the score is high, too. Only the samples that are above a score threshold are chosen for the training. The main assumption here is that auxiliary data S should have similar distribution as that of the target set T. Results show that this procedure increases the overall performance. Training is performed using the meta-learning approach.

There are many other techniques in the few-shot learning domain and their prevalence is increasing in the top computer vision conferences. In this article, only some of them that are proven successful before are mentioned.

Here is my another post on setting up an effective deep learning development environment:

References

  1. Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. “Siamese neural networks for one-shot image recognition.” In ICML deep learning workshop, vol. 2. 2015.
  2. Hoffer, Elad, and Nir Ailon. “Deep metric learning using triplet network.” In International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer, Cham, 2015.
  3. Schroff, Florian, Dmitry Kalenichenko, and James Philbin. “Facenet: A unified embedding for face recognition and clustering.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823. 2015.
  4. Vinyals, Oriol, Charles Blundell, Timothy Lillicrap, and Daan Wierstra. “Matching networks for one shot learning.” In Advances in neural information processing systems, pp. 3630–3638. 2016.
  5. Snell, Jake, Kevin Swersky, and Richard Zemel. “Prototypical networks for few-shot learning.” In Advances in neural information processing systems, pp. 4077–4087. 2017.
  6. Wang, Yaqing, Quanming Yao, James T. Kwok, and Lionel M. Ni. “Generalizing from a few examples: A survey on few-shot learning.” ACM Computing Surveys (CSUR) (2019).
  7. Zhang, Yabin, Hui Tang, and Kui Jia. “Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data.” In Proceedings of the european conference on computer vision (ECCV), pp. 233–248. 2018.

--

--