Implementing Zero-Shot Classification in Python: A Step-by-Step Guide

7 min readMay 24, 2023

In a previous article, we delved into the world of Zero-Shot Classification, discussing its mechanisms, applications, limitations, and future prospects. This innovative approach to machine learning, with its ability to handle unseen classes, has vast potential to revolutionize sectors where labeled data is scarce. If you haven’t had a chance to read it yet, I encourage you to take a look here.

Building upon that foundational knowledge, this article takes a more hands-on approach. We’ll implement Zero-Shot Classification using Python, exploring a practical application of the concepts we previously discussed. For coders and data science enthusiasts, this piece will provide a clear guide to leverage the power of Zero-Shot Classification in real-world scenarios. Let’s roll up our sleeves and start coding!

Prerequisites

This tutorial requires Python 3.x and the following libraries:

Pandas: For data manipulation.
Transformers: From Hugging Face, for leveraging pre-trained models.
Torch: A machine learning library, used here for natural language processing tasks.

Ensure these are installed in your Python environment by running the following commands in your terminal:

pip install pandas
pip install transformers
pip install torch

Once installed, load the necessary libraries in your Python script or Jupyter notebook:

import pandas as pd
from transformers import pipeline
import torch

Lastly, you’ll need a pre-trained model* like BERT or GPT-2 from the transformers library, and a dataset for testing the zero-shot classification implementation.

*typically loaded when you create your classification pipeline, and isn’t loaded separately at the beginning like the libraries.

Now you’re ready to start coding! Let’s dive into the Python implementation of zero-shot classification.

Understanding the Dataset

For this tutorial, let’s use a simplified, hypothetical dataset containing customer reviews of various tech products. Each review is associated with a product category like ‘Smartphone’, ‘Laptop’, ‘Headphones’, etc. However, the data also contains reviews of products from new categories not present in the original classification scheme, making it a perfect fit for a zero-shot classification task.

You can generate a synthetic dataset with pandas, as shown below:

# Sample data
data = {
    'Product Category': ['Smartphone', 'Smartphone', 'Laptop', 'Headphones', 'Smartwatch', 'Unknown', 'Unknown'],
    'Review': [
        'I love the camera on this phone!', 
        'The battery life on my phone is amazing.', 
        'This laptop is super fast and lightweight!', 
        'These headphones have excellent sound quality.', 
        'The fitness tracking features on this watch are very useful.', 
        'This device helps me keep track of my daily activity.', 
        'My home has never felt safer.'
    ]
}

df = pd.DataFrame(data)

In this dataset, the ‘Unknown’ category represents new product categories that the original classification model was not trained on. These could be ‘Fitness Tracker’ or ‘Home Security System’ — categories the model has never seen before. Our goal with zero-shot classification is to accurately categorize these ‘Unknown’ entries.

Remember, in a real-world scenario, you’ll often work with much larger and more complex datasets. You would need to perform additional steps to clean and prepare the data for your model. But for the purposes of this tutorial, this simple dataset will suffice.

Exploring the Pre-Trained Model (BART)

For our zero-shot classification task, we’ll use the pre-trained model ‘facebook/bart-large-mnli’ from the Hugging Face’s transformers library. This model is based on BART (Bidirectional and Auto-Regressive Transformers), a powerful transformer architecture for natural language understanding.

Trained on the MNLI (Multi-Genre Natural Language Inference) task, this model excels at grasping semantic relationships between sentences — a key skill for zero-shot classification. By using the model’s language understanding, we can classify data into unseen categories based on semantic relevance.

We’ll set up the pipeline in the next section, let’s go!

Coding the Zero-Shot Classification Task

We’ll categorize the reviews in our dataset into product categories, including those the model hasn’t been trained on.

Set Up the Zero-Shot Classification Pipeline:

First, we’ll set up a zero-shot classification pipeline with our chosen pre-trained model, BART.

# Initialize the zero-shot classification pipeline
classifier = pipeline('zero-shot-classification',
                       model='facebook/bart-large-mnli')

2. Define Candidate Labels:

We need to provide the model with potential labels for the classification. Let’s consider the known product categories and add some new ones representing the ‘Unknown’ reviews.

# Known product categories
known_labels = ['Smartphone', 'Laptop', 'Headphones', 'Smartwatch']

# New product categories (potential labels)
new_labels = ['Fitness Tracker', 'Home Security System']

# Combine all labels
labels = known_labels + new_labels

Here we added ‘Fitness Tracker’ and ‘Home Security System’ as the potential labels for the unknown product categories. These potential labels will be the model’s options in classifying.

3. Classify the Reviews:

Now, let’s classify the reviews. For each review, the model will return the label with the highest semantic similarity.

# Create a new column 'Predicted Category' in the df to store the predictions
df['Predicted Category'] = df['Review'].apply(lambda x: classifier(x, labels)['labels'][0]

Through these steps, we’re implementing zero-shot classification by asking our pre-trained model to predict the most semantically similar label for each review. This method allows us to classify data into categories that the model hasn’t seen during training.

Let’s check the results in the next section!

Model Evaluation and Results

To evaluate the performance of a zero-shot classification model, we often measure its accuracy — how often it correctly classifies the data. But remember, zero-shot learning is more challenging and the accuracy might be lower than in traditional supervised learning scenarios.

Now let’s inspect the classification results:

Review the predictions for ‘Unknown’ entries.

Are they logically coherent?
Are they useful for my application?

These are the types of questions that can help assess the performance of zero-shot classification in practice.

The dataframe with classification results, stored under ‘Predict Category’ column

For ‘Unknown’ reviews (rows 5–6), the model classified them into plausible new categories. Review 5, mentioning daily activity tracking, was classified as ‘Fitness Tracker’. Review 6, implying home safety, was classified as ‘Home Security System’. This shows the model’s ability to classify reviews into relevant categories even without prior exposure to these categories.

Overall, the model effectively classified all reviews, demonstrating the power of zero-shot classification.

Potential Applications

The Python code we’ve implemented for zero-shot classification has several practical applications. It can be used in recommendation systems, content moderation, sentiment analysis, and more.

A common use case is text classification, where the categories are dynamic or unknown during model training.

For example, in a customer support context, zero-shot classification can help route customer complaints or queries to the relevant department, even for new topics or issues that weren’t present in the training data. Similarly, in news article categorization, it can help classify articles into new, emerging topics.

Challenges and Limitations

Despite its benefits, zero-shot classification with Python isn’t without challenges. The quality of results depends heavily on the pre-trained model and its understanding of language semantics. Furthermore, while powerful, transformer models like BART require considerable computational resources.

Additionally, it’s important to acknowledge that our implementation is a basic example of zero-shot learning. It assumes that the labels provided to the classifier cover all possible classes, which might not be the case in more complex scenarios.

Moreover, evaluating the performance of zero-shot learning models can be difficult, especially when there’s no ground truth available for new classes.

In spite of these limitations, the promise of zero-shot learning makes it an exciting area of study and an increasingly practical tool for a range of machine learning tasks.

Conclusion

In this article, we’ve walked through a Python implementation of zero-shot classification using a pre-trained BART model from Hugging Face’s transformers library. We’ve seen how this technique can classify text into categories that the model hasn’t been explicitly trained on.

We also discussed the potential real-world applications of zero-shot classification, highlighting its relevance to dynamic classification problems. Finally, we addressed the challenges and limitations of this approach, and the need for careful evaluation of its performance.

As machine learning continues to evolve, zero-shot classification offers a promising avenue to handle complex, dynamic classification tasks. I encourage you to explore this area further and experiment with different pre-trained models and datasets. The ability to classify data into unseen categories opens up exciting opportunities in the world of AI.