ZERO SHOT LEARNING
Machine Learning models learn from the data you provided for training and using it they classify the test data correctly. But normal models do not work properly for purely unseen objects. So, in Computer Vision, how will you classify a new object for which you have not a single training object? Here Zero-Shot Learning (ZSL) can help you.
What Is Zero-Shot Learning:
When we want to classify or recognize something in test data for which there is no direct information in training data then zero-shot learning can be usable. Zero-Shot Learning predicts unseen objects. As from its name, Zero-Shot, it refers to the zero examples of an object which we want to predict. It is based on transfer learning.
Zero-Shot Learning tries to work like a human. We are able to recognize objects when we have an only short description of that object with the help of using previously learned concepts. Let me explain with one popular example out there:
Suppose a young boy can recognise a horse but not zebra. One day his mother taught him that zebra looks like a horse but with black and white stripes then most probably, he will be able to recognize zebra as well.
Same way Zero-Shot Learning uses information from the labelled training dataset and tries to recognize unseen objects which are somehow related to seen classes.
How it works:
Zero-Shot Learning approaches learn intermediate semantic layers of an object and its attributes and apply them at the time of predicting new classes of data. Both seen and unseen classes are somehow related in high dimensional vector space and for this deep learning, models are used to calculate feature vectors for images. ZSL models work on word embedding. There may be image embedding and class embedding. You can think of it as lemmatization in Natural Language Processing that how it gets the same context word from many words and put them in one label similarly ZSL models work on relationships between training and test classes labels and their attributes.
The information which ZSL models use for unseen classes gained from seen classes can be in the form of:
1. Learning from Textual Description: Here we use natural language processing in computer vision. We generate a textual description of images, it can be in the form of words or sentences. Then we cluster images in a semantic way and learn the correlation between visual and textual form. And then the feature extraction process is performed on the textual description. We can use for example tf-idf for it. As we cannot directly use sentences in our model, So, it will measure the weights of words in numeric form.
2. Learning from Attributes: Objects have their attributes. So, models will learn attributes of training data and will check its relation with unseen objects. For example, ZSL will learn attributes of a horse that it has four legs, a tail and many others, so it comes under the animal label the same way zebra has these attributes and that will be labelled as an animal.
3. Class Similarity: Objects belong to some class. For unseen objects, the nearest neighbour class (which is most similar to it) will be predicted. For visual features, we can use embedding in semantic space and common intermediate space. But it can make mistakes and classify objects into earlier seen classes.
Over the years researchers are trying to make a machine learn more and more like humans. I covered one of these techniques in this blog. Do read the following research papers to learn more about zero-shot learning: https://arxiv.org/pdf/1710.04837.pdf
https://arxiv.org/pdf/1611.05088.pdf
If you like this blog do-follow DSC JSSATEN and Pratibha gupta for more. If you have any suggestions, feel free to contact us at admin@dscjss.in
You can also connect with us on our Instagram, Twitter or Facebook page.