Unlocking the Power of NLP Zero-Shot Learning

Published in

BIP xTech

5 min readApr 24, 2024

Transforming Customer Feedback Analysis

The artwork features an open book with pages transforming into flying birds, symbolizing the acquisition of knowledge beyond traditional learning boundaries.

In the rapidly evolving field of natural language processing (NLP), zero-shot learning represents a significant leap forward. This innovative approach allows models to perform tasks they were never explicitly trained to handle, which opens up a myriad of applications that can benefit from NLP without the typical constraints of data-heavy training processes. In this article, we’ll explore what zero-shot learning is, how it works, and demonstrate its utility through a practical use case involving the categorization of customer feedback.

Understanding Zero-Shot Learning

Zero-shot learning is a type of machine learning where a model is capable of correctly performing tasks that it has not been explicitly trained to solve. This might sound like magic, but it’s grounded in a deep understanding of how to generalize from known tasks to unknown ones. In the context of NLP, zero-shot learning utilizes models that have been trained on a broad range of data and tasks, learning a rich representation of language that can be flexibly applied to new problems.

How is Zero-Shot Learning Possible?

To achieve zero-shot learning capabilities, a model must be trained on a diverse set of tasks and languages, often using a technique called “transfer learning.” This involves taking a model that has learned general language representations on large text corpora and fine-tuning it on a dataset that includes a variety of tasks. Each task is accompanied by natural language descriptions or instructions, helping the model learn not just to predict but to understand context and apply its knowledge in new, unseen scenarios.

For instance, the model used in our code example, BART (Bidirectional and Auto-Regressive Transformers) architecture (large), which is trained as a denoising autoencoder to reconstruct text. This model is then fine-tuned on the MultiNLI dataset, a task that requires the model to understand natural language inference, i.e., to determine if one sentence logically follows from another. This fine-tuning equips the model with the ability to generalize from language understanding to infer the appropriate labels for texts even when not explicitly trained on those labels. The zero-shot learning is facilitated by the model's inherent capability to evaluate the probability of one piece of text (like a feedback comment) being an instance of one or more categories (like performance, cost, design).

This specific understanding and inference ability make the model highly effective for zero-shot classification tasks where the categories may not have been explicitly included in the training data.

Use Case: Analyzing Customer Feedback

The Challenge

Businesses often gather extensive customer feedback that varies not only in content but in the topics it touches upon, such as product performance, design, cost, and customer service. Analyzing this feedback manually to extract actionable insights is time-consuming and inefficient, particularly as businesses scale.

Zero-Shot Learning to rescue

With zero-shot learning, businesses can deploy models to automatically categorize feedback into predefined or dynamic categories based on the content of the feedback itself, even if those categories were not part of the model’s training regime.

Let’s see with an explicative example how we can achieve or goal.

Firstly, let’s import the libraries we need:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

Now it’s time to import the model:

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")

Then, we define few sentences that simulates feedbacks and the categories we want to use to label the feedbacks

feedback = [
    "The new software update made the app much faster and smoother!",
    "I found the customer support unhelpful and rude.",
    "The cost of the subscription has become too expensive for the features offered.",
    "It took too long to install the product, but it works well."
]
categories = ["performance", "customer service", "cost", "usability"]

It’s time to implement our zero-shot learning solution. Let’s define a function that for a given feedback, computes the probability of belonging to each category, and assign the final label according to the category that has received the highest probability.

def classify_feedback(input_text, categories):
    probabilities = [] # we store the probabilites of each category her
    for c in categories:
      premise = input_text # the feedback is the premise of our prompt.
      hypothesis = f'This example is {c}.' # this sentence prompts the model to classify the feedback
      x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                  truncation=True) # we encode the prompt built using premise + hypothesis
    # Compute softmax to get probabilities
      logits = model(x)[0] # we get the logit scores of our output

      cat_contradiction_logits = logits[:,[0,2]] # we get the logit scores of the category being false (0) or true (2)
      probs = cat_contradiction_logits.softmax(dim=1) # we get the corresponding probabilities
      prob_label_is_true = probs[:,1] # we get the probability of the category being true
      probabilities.append(prob_label_is_true.detach().numpy())
    probabilities = np.array(probabilities) 
    result = probabilities.argmax() # here we take the index corresponding to the highet probaility
    return categories[result] # we return the category with the highest probability

The function works in this way:

It builds a prompt composed by the premise that is the feedback, and the hypothesis, which is the the sentence: “This example is {c}.” where c is the category of each iteration of the for loop.
It tokenizes the prompt and feeds it to the model.
It computes and returns the logit scores assigned to the last word that is our category.
It transform the logit scores in a probability
At the end of the for loop, it assign to the feedback the category with the highest probability.

We can see this in action:

for fb in feedback:
    print(f"Feedback: '{fb}' - Categorized as: {classify_feedback(fb, categories)}")

Output of the previous chunk of code

We can see that our model was pretty accurate. Another possibility is to modify the code to return multiple categories for each feedback, because it may be the case that a feedback belongs to more categories. For example the last one it’s about usability, but also about performance as in a product its usability is directly related to its performance. I will leave this extention as homework for the reader ;)

Conclusion

Zero-shot learning stands out as a flexible and powerful tool in the arsenal of NLP technologies, particularly for applications where the scope of potential queries or topics is vast and ever-changing. As we’ve seen with the customer feedback analysis, zero-shot learning can significantly streamline operations, providing businesses with quick, actionable insights derived from large volumes of unstructured data. This capability will undoubtedly continue to be a game-changer as more companies seek to leverage AI in innovative ways.

I hope that with this tutorial you zero-shot learn the power of zero-shot learning.

I wish you good learning

Roberto

References

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv preprint arXiv:1910.13461.
Williams, A., Nangia, N., & Bowman, S.R. (2018). A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.