Bootstrapping Classification with a Zero-Shot Classifier

Published in

Steamship

5 min readSep 8, 2022

The Challenge of Getting Started

Getting a text classification project off the ground is challenging because it’s a product management chicken and egg problem. You need to show value of the new feature to justify collecting training data, but you can’t show this value until you’ve collected training data to train a model. In this post I will discuss how to use a zero-shot classifier to start labeling without training data, gather training data as part of the normal operation of your product, and seamlessly transition to a trained model.

“digital art of a model chicken gazing at an egg made of data”, courtesy of DALL-E

As an example, let’s assume your product is service desk software. You’d like to make an AI that routes tickets to specific teams based on the category of the ticket. Let’s also assume that your product has many different customers, all of whom will have different categories of tickets.

In this scenario, the amount of data you have is likely highly uneven. Older customers may have many categorized tickets, newer customers may have few or none. In all cases, existing data may not be comprehensively categorized. So what can we do?

Starting with a Zero-Shot Classifier

Using zero-shot classification, you can start categorizing tickets immediately. Zero-shot classification is a technique where no specific training data about your classes is required. To me, this seems almost magical, but it’s enabled by the combined power of very large language models and transfer learning.

You can think of it like asking a random person on the street a multiple choice question:

“Help! I can’t get my password to work! I’m locked out!”
Does the above sentence describe:
a) A new feature request
b) A networking issue
c) Needing login help
d) A bug report

Like a person on the street, the AI can often answer reasonably with just its general knowledge about language. An example of zero shot classifier is here.

This approach is not without limitations. Categories that require specific context from your application will likely yield poor performance. For example, if we tag tickets with categories like “Sarah can solve this”, the AI is unlikely to categorize them correctly. We can’t blame the AI, though. A random person on the street would have the same problem! A zero-shot classifier is also more computationally expensive, which means more time and money per usage.

Collecting Training Data

Since zero-shot classification is not a panacea, you probably want your customer’s predictions to improve as they use the system. This is a feature frequently requested of AI systems, but it can be tricky to implement. You can start with the zero-shot classifier for all of your customers, but specializing it requires gathering data and training a model per customer.

If you want to make the data gathering scale across many customers, it needs to be built into your product’s workflow. If the AI system proposes categories A, B, and C, the user may disagree with C and remove it; this is valuable feedback! If they add a new tag that the model didn’t predict, that’s feedback too.

A user removing a category they disagree with

Whenever a user disagrees with the model, we need to save that data. However, the corrections are specific to a context: that customer, that service desk, etc. Keeping the context with the corrections is crucial; each context effectively needs to become a different trained classifier. Once any one of those contexts gathers enough data, you can switch it over from zero-shot to a trained model.

Training a Model

Now that you’ve collected your training data for some customers, there’s good and bad news when it comes to training the models.

Good news — Fine tuning a classifier using a large language model can give good results with small data sets, as low as tens of examples per class.

Bad news — Instead of hosting one model, you’re now hosting tens, hundreds, or thousands — one per customer.

Moreover, customers’ usage varies wildly, resulting in different scaling needs. The architectural complexity is significantly higher. That sound you hear right now is the steam escaping the ears of your infrastructure team.

There are cloud-based managed services that will fill in parts of your needs here. Using Google Vertex AI or AWS Sagemaker will allow you to leverage auto-ML tools while reducing the infrastructure you need to manage. However, you’re still going to need to design the mechanisms to feed the data to these services, keep track of the models, and route your requests appropriately.

Or…

Building on Steamship

Steamship was built to solve exactly these sorts of AI lifecycles, including both the transition from zero-shot to trained classifiers and the data collection and scaling across your customers. We bundle them up into turnkey software libraries called Packages (read more here).

We built a ticket-tagging package that lets you tag support tickets from Day 1. As users interact with tickets and tags, data can be sent to the classifier as training examples. When enough data is available in a customer’s context, the classifier can migrate to a trained version. With this package, this journey can be simplified into a sequence of API calls:

set_labels — Set the categories you want to apply to the tickets.
tag_ticket — Start tagging tickets immediately with a zero-shot classifier.
add_example — Provide a training example from users or manual annotation.
start_specialize — Use provided examples to tailor a model
tag_ticket (again) — Seamlessly continue tagging tickets, now with your trained classifier!

Best of all, it’s as easy to do this 100 times as it is to do it once; you just need to create more instances of the package.

If this sounds like a fit for your needs, onboard here (we’ve just launched, and are doing it over zoom — but it’s quick!), or you can view a demo in Streamlit here.