What is Few-Shot Learning?

Jelal Sultanov
AI³ | Theory, Practice, Business
4 min readAug 10, 2020

When I was doing one another custom object detector on Tensorflow and Darknet frameworks, I encountered the typical problem in machine learning — the lack of the image dataset. Moreover, the problem redoubles with a certain degree of dataset quality that is needed for a good object detector. In machine learning, it is hard to meet all requirements such as image dataset size (a certain number of instances of each class should be obtained before the training process begins), clean and labeled dataset. In enormous real-world situations, obtaining enough dataset and right labeled dataset usually becomes very expensive and difficult to manage. Exactly in this kind of situation, a few-shot learning method could affect your project’s future development.

When I started the training process, I found out that my dataset size was not enough to obtain the desired level of loss on the training process. When I began researching how I can find workaround of this problem I found an article and Python notebook dedicated to an eager few-shot method for object detection that was made on new Tensorflow Object Detection API with Tensorflow 2.x. In this article, I would like to share my understanding of few-show learning in machine learning, after I research this topic.

What Few-Shot learning means?

Since the beginning of the rise of machine learning, we have been comparing Artificial Intelligence to the human brain. In this situation, we also can compare how a human’s brain can be less strict to “dataset” size that can be necessary for learning new skills. Human beings are capable of learning and recognizing things by just looking or repeating them a few times. But in usual machine learning practice, the machine learning engineer usually takes care of preparing a large, clear and annotated dataset before doing any computer vision project. But sometimes, even quite often, the dataset can be very incomplete. Imagine a situation, machine learning engineer would do an ancient Egypt character classification or detection. In this kind of situation, we would have small dataset few-shot learning means as it names suggests, using a very little number of samples of each class to feed object detection model contrary to the normal practice of using large image datasets. For example, in a Python notebook I mentioned above, the author used only 5 images of one class to train the RetinaNet object detection model. This Retinanet model was trained on Google Colab that gives free GPU. I was very surprised by how an object detector could be trained on only 5 images. So I decided to learn what few-shot learning means and understand intuitively how it works and write an article about what I learned about few-shot learning.

The few-shot problem usually uses the N-way K-shot classification method. N-way and K-shot mean, we learn to discriminate N separate classes with K instances in each N class. So if we would classify images between dog and cat classes with 3 instances in each, our classification would become 5-way 3-shot classification. In this approach, we cannot use any other algorithms and methods that are used in classic deep learning in computer vision, because the scarcity of dataset would cause poor generalization. The only possible solution is to gain knowledge/experience from other similar tasks. Few-shot learning problems can also be characterized as a meta-learning problem.

In classic machine learning projects, our model learns how to classify from the training set and evaluate it on the test set. But in the meta-learning approach, we learn how to learn to classify objects from the train set.

In the picture above you can see that we have 3 tasks: 2 training tasks and 1 test task. This problem can be classified as N=3 ways K=2 shots problem. The sets that are known as support sets are used to learn how to solve task, and sets, that are known as query sets, are used for evaluating performance on this task. The classes may not overlap on each task, on every task we can observe the classes that we have never seen before.

During the training process, we update model parameters based on randomly chosen training tasks. The loss function calculated on the query set of the given training set, based on what has been learned on the support set of given training tasks. But for evaluation of performance, we use a test task that follows the same structure with previous training tasks. Each test task contains completely different classes that do not exist in previous training tasks.

In the following articles, I will be writing mainly about Deep Learning and Computer Vision in Medical Image Analysis. My startup is about AI in medicine and telemedicine. As I will be doing my startup’s MVP I will be sharing ideas I think would be good to implement and offer as a service in medical and telemedical AI application. I will be mainly writing ideas and asking for your opinions. As it is one of the best methods of connecting and exchanging ideas with AI folk. I will be very glad if you leave your comments below to discuss the topic or have some debates on that.

Few-Shot Youtube Video:

https://youtu.be/5kN0CuAj46A

Conclusion:

Thanks for reading! If you enjoyed this article, please hit the clap button 👏 as many times as you can. It would mean a lot and encourage me to keep writing stories like this. Let’s connect on Twitter!🐦

--

--