Few Shot Learning in NLP With USE and Siamese Networks (Code Walkthrough)

Published in

The Startup

5 min readOct 1, 2020

Annotated data has always been a challenge for supervised learning. Over the time people has experimented various way as to how to make a model learn from only a handful of samples per class. Few-shot learning refers to understanding new concepts from only a few examples. This blog post aims at developing a few shot NLP model leveraging transfer learning and siamese networks with triplet loss. After reading this blog you will be able to develop your own few shot NLP model for text classification.

Table of contents
1.Introduction
2.Siamese Network
3.Triplet Loss
4.Universal Sentence Encoder
5.Code Walkthrough in keras
6.Future scope and references

https://unsplash.com/s/photos/artificial-intelligence

Introduction

This blog is about how we can train our models for NLP tasks with very few training data points. In this blog I would demonstrate how we can leverage the siamese networks and triplet loss ( Which were initially used for computer vision tasks ) along with transfer learning to achieve good results with very few data points in classification task of NLP. This blog contains code walkthrough along with the theory related to it. After finishing this blog you will be able to create your own few shot learning model with siamese networks and triplet loss . So let’s dive in.

Siamese Networks

Siamese networks are similar neural networks where input vectors are passed through it to extract features , which are later passed through the triplet loss in the few shot learning process. These networks are initially used for computer vision tasks but the same idea can be extended to text classification too . The pictorial representation of siamese networks for text classification using triplet loss is as follows.

**You can look the the below two resources for in-depth knowledge in siamese networks and triplet loss**

Triplet Loss

The triplet loss takes three input embeddings of an anchor, positive and negative data points. The anchor and positive embeddings are of same class and negative embedding is of different class. We try to project the embeddings such that the distance of anchor to negative is alpha more than the distance from anchor to positive. Alpha is also know as the margin, if the difference of distance is greater than the margin than the loss is zero otherwise the difference in distance is considered as the triplet loss and the loss is back-propagated through the siamese network.The mathematical formulation of the loss can be seen below.

source :https://medium.com/mathematical-beauty/deep-learning-for-cosmetics-2d1b427bbfa2

Universal Sentence Encoder

The universal-sentence-encoder is a sentence encoding module of tensorflow-hub. We will be using the pre-trained model to create embeddings for our sentences. It encodes sentences in to high dimensional embeddings which can be further used for text classification, semantic similarity, clustering and other natural language tasks.The embeddings vector is 512 length, irrespective of the length of the input.We will use this pre-trained Universal sentence encoder to encode our sentences to get better representation of our sentences leveraging transfer learning.

**You can refer the below mentioned link for in depth knowledge of Universal sentence encoders.**

Universal Sentence Encoder

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP…

arxiv.org

Code Walkthrough

I will consider this dataset in this blog post. The dataset has 5 classes with 20 samples each in the training file containing a total of 100 samples and a total of 3277 samples in the test file. You can fine the dataset here.

1.Loading the data and selecting the relevant columns( For this tutorial I am not doing any pre processing, you are free to apply your own pre processing function)

2.Importing and loading Universal sentence encoder

3.Constructing the siamese model

After getting the sentence embeddings from universal sentence encoder I will pass the embedding through the siamese model . I have use a simple network with few dense layers, U can play with the network to play optimum results for your custom dataset.

4. Triplet Loss and Training Network

I have constructed the triplet loss and returned the sum of the losses, you can also take its mean. While taking mean u must ensure that u are doing hard negative mining of the triplets(anchor, positive and negative) selected.The concept of Hard negative mining is well explained in the reference videos shared above.

5. Constructing a data generator

The data generator will use the Universal sentence encoder to encode the sentences and pass those encodings to the network

6. Now we have everything setup . so, let’s train the model

7. After training the model to 100 epochs , let’s test the model on the test data. I have used KNN and SVM to classify the test dataset.

KNN accuracy = 0.8764113518462008, SVM accuracy = 0.8748855660665242

Since the accuracy of KNN is slightly better, lets look at its classification report

precision    recall  f1-score   support

           1       0.93      0.91      0.92      1072
           2       0.58      0.87      0.70       262
           3       0.91      0.76      0.83       425
           4       0.85      0.86      0.86       532
           5       0.94      0.90      0.92       986

    accuracy                           0.88      3277
   macro avg       0.84      0.86      0.84      3277
weighted avg       0.89      0.88      0.88      3277

8. T-SNE Visualisation of the test dataset.

As we can see the test dataset has 5 distinct clusters in the TSNE representation. The representation of the clusters were learned from only 20 samples per class in the training dataset. Isn’t it AMAZING!!

*You can find the complete notebook here**
Since this is my first blog post I am open to any suggestions which would help me to contribute better to the community.

Future Scope and References

I plan to explore different methodologies of few shot learning in my upcoming blog post. Clearly we can use terminologies from computer vision to NLP related tasks and vice versa to enhance our capabilities in both the domains. Here I am attaching the link from where I first came to know about siamese networks and triplet loss.

Motivation and references :

TessFerrandez/research-papers

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

github.com

Few Shot Learning in NLP With USE and Siamese Networks (Code Walkthrough)

Introduction

Siamese Networks

Triplet Loss

Universal Sentence Encoder

Universal Sentence Encoder

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP…

Code Walkthrough

Future Scope and References

TessFerrandez/research-papers

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

Written by koushik konwar