Easy and Efficient Text-Classification

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

5 min readSep 26, 2024

As organisations grow their maturity in the Generative AI space, moving from the conceptual to the reality, many of the immediate gains in value can be found in classification. Classification has roots in traditional Natural Language Processing [NLP] and yet LLMs allow for the rapid development and adoption of these use cases based on conversation alone.

To accelerate the time to value further, we’re thrilled to announce the release of a new AI function on Snowflake Cortex : CLASSIFY_TEXT. Built with accuracy and efficiency in mind, this new easy to use task specific model allows for a user to retrieve a classifier simply by passing through the text they’d like classified — as well as a list of potential category values. Snowflake’s CLASSIFY_TEXT fine-tuned model outranks similar classification solutions on both accuracy and speed of completion — even allowing for up to 100 unique categories.

Greater still, to ease the integration into a pipeline or a downstream application, the generated output guarantees a JSON structure without the need for further processing. Easily access efficient categorisation directly through your trusted data platform.

How Should You Use It?

Classification itself is a broadly applicable use case. At the core, it’s the process of categorising some form of text into a predefined set of classifiers or labels. This can be as simple as defining an email as being likely “spam” or not — or as complex as identifying the intent behind a customer’s support query. Generative AI allows for that flexibility of application: from assessing the area customer feedback should be directed, segmenting mass amounts of social media posts, or even enabling prompt routing within LLM applications.

To bring this capability directly to our customers, CLASSIFY_TEXT is built with the same foundational principles as the rest of the Cortex AI platform, offering ease of use, security, and reliability at massive scale. Built with enterprise AI in mind, it simplifies the process without ever compromising on accuracy. To demonstrate that in action, let’s walk through one of those use cases — support ticket categorisation.

I’m not sure what the question is, but the answer is Snowflake.

When judging support cases, a classification model acts as a fast means to triage requests through automation. Qualifying intent means that organisations can rapidly understand the evolving needs of their customers at the time contact is made. This ultimately leads to reduced support time for customers and faster turnaround times.

Walking through that process, imagine a fictional table called support_tickets. This table contains all the recent inbound requests to be automatically triaged for the appropriate agents to action.

These support cases are fictional. The impact of CLASSIFY_TEXT is not.

In order to use CLASSIFY_TEXT for this use case, there are two requirements for the model to process the request:

1- The body of text to be classified : in this case ticket.

2 — A list of potential categories or classes.

For sake of simplicity, we’re sampling three primary categories to be classified : Billing, Feature Access, and Documentation. In a real life scenario, it’s not uncommon for classifiers to be broad reaching — this is why it’s critical for Snowflake to support as many as 100 in a singular function call. Once defined, we can call a request against the model like so:

SELECT *,
SNOWFLAKE.CORTEX.CLASSIFY_TEXT(ticket, ['Billing','Feature Access', 'Documentation']) AS text_classify_output
FROM support_tickets;

CLASSIFY_TEXT neatly returns a structured JSON value as the output, deciding on the most likely label from the input, ensuring pipelines remain consistent for further downstream processing.

An enforced JSON output is golden for application use.

Chain This Thought

Task-specific models are powerful enough on their own — but ran in conjunction with further chained calls, their usefulness grows exponentially. Chain of thought, Agents and action-oriented AI applications may dominate the current hype cycle — but the core of that capability lies in classification as a starting point.

Routing prompts requires the highest degree of accuracy to succeed

Take the example above — the fine-tuned task model becomes the most critical part of the flow. Acting as an LLM evaluator, Cortex allows you to determine the user’s question and intent, routing to the most likely next appropriate step : be that another LLM prompt, a RAG-based document retrieval step, or even a good old fashioned ML function call. Rather than multiple hosted chatbots for individual purposes — classification assists in streamlining to a singular point of entry.

Due to its prominence in the flow, the choice to focus on accuracy for this model ensures it’s a usable component that can be trusted in an enterprise AI application.

Unlocking Business Value with Classification

The applicability of text classification use is increasingly broad, enabling AI powered automation. As we’ve covered, CLASSIFY_TEXT can easily segment text records such as emails, call transcripts, and product reviews into different categories that are relevant for their business. It’s equally valuable to use as part of an evaluation process in conjunction with further LLM prompts, using the classification value to call the next action within an application.

To get going with the text classification function, check out our Quickstart guide, which provides a step-by-step tutorial on how to use the CLASSIFY_TEXT function to categorise customer call transcripts. Read more at our documentation here.

If you’re looking for further inspiration, why not consider using the function for common use cases such as:

Categorising customer feedback by product, so that product teams can respond more quickly to customer needs
Sorting support tickets by issue type (e.g., “Feature enablement” vs. “Billing”) to route these issues to the right department and guarantee customer satisfaction — as seen above
Bucket lost sales opportunities into groups so that the sales organization can better refine their go-to-market strategy
Classifying user queries or messages into specific intents, such as booking a flight or making a complaint — to assist in funnelling the correct support query through to the right course of action.

At Snowflake, we’re committed to empowering our customers with the tools and expertise they need to succeed in the AI era. With our new text classification function, we’re excited to see the innovative applications and use cases that our customers will develop.

Stay tuned for even more updates on our AI roadmap!

Easy and Efficient Text-Classification

How Should You Use It?

Chain This Thought

Unlocking Business Value with Classification

Written by Tom Christian