Deploying pre-trained LLMs in Snowflake.

Learn how to deploy pre-trained large language models from Hugging Face in Snowflake using vectorized Python UDFs.

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readJun 5, 2023

Deployment of Hugging Face pre-trained LLM in Snowflake using internal stages, and Python vectorized UDF.

Large Language Models (LLMs) like ChatGPT have gained popularity for their ability to generate human-like responses to text-based input. However, ChatGPT is only one of many LLMs that has produced impressive results in recent years.

Open-source LLMs like BERT (Bidirectional Encoder Representations from Transformers), created by Google researchers in 2018 or, RoBERTa (Robustly Optimized BERT approach) developed by researchers at Facebook AI Research (FAIR) in 2019, have also excelled on various tasks, such as feature extraction, summarization, text classification, sentence similarity, and much more.

Some of the benefits of using open-source LLMs as ChatGPT alternatives:

Flexibility: Open-source LLMs can be used for various natural language processing tasks and in multiple languages.
Transparency: Open-source LLMs provide transparency through openly available code and data for inspection and modification.
Cost: Open-source LLMs are often available at a lower cost or no cost compared to proprietary models like ChatGPT.
Community support: Open-source LLMs are supported by a community of developers and researchers who provide resources, tools, and support.

Hugging Face, a popular open-source AI community, provides a well-documented and easy-to-use library to interact with many of these remarkable open-source LLMs. Using this library, we can deploy these LLMs directly in Snowflake, offering numerous benefits, including improved efficiency by eliminating data movement, real-time analysis, scalability, advanced security features, and potential cost savings.

In this article, we will guide you on deploying pre-trained LLM models in Snowflake using vectorized Python UDFs for batch processing. By the end of this article, you will have the tools to start unlocking the potential of these models in your organization.

Hugging Face Transformer library

The Hugging Face Transformer library is an open-source Python library for natural language processing (NLP) tasks. It provides a collection of state-of-the-art pre-trained LLMs for various tasks such as text classification, question answering, text generation, and more.

The library is built on top of PyTorch and TensorFlow 2.0 (The library uses Pytorch by default). It provides a unified API for working with various pre-trained models from various sources.

This article will focus on one particular use case of this library — Zero-Shot classification. This technique involves classifying text into pre-defined categories without needing labeled training data. In other words, zero-shot classification allows us to classify text into categories the model has not explicitly trained on.

For example, we can classify client reviews into topics referring to different parts of your business, then use this categorization to identify areas for improvement and gain valuable insights. This can be done in Python by executing the following code:

from transformers import pipeline
classifier = pipeline(
    "zero-shot-classification",
    model="Facebook/bart-large-mnli"
)
sequence_to_classify = "The interface gets frozen very often"
candidate_labels = ['customer support', 'product experience', 'account issues']
classifier(sequence_to_classify, candidate_labels)

The model assigns a probability to each of the categories based on how related is the text to the label. To get a prediction, we need to choose the category with the highest probability.

Is it simple? Yes, it is. All we have to do is select the model we want and the categories to be used as candidate labels. The process of using other models for tasks like sentiment analysis, text summarization, and feature extraction is just as straightforward.

Now, how can we make this functionality available directly into Snowflake?

Snowflake vectorized UDFs (User-Defined Functions)

Vectorized UDFs (User-Defined Functions) are a powerful feature in Snowflake that enable more efficient and high-performance data processing. Compared to traditional scalar UDFs that process data one row at a time, vectorized UDFs can process multiple rows of data in a single batch, resulting in improved query performance and reduced data movement.

Additionally, vectorized UDFs use a simpler syntax more closely aligned with SQL, making it easier for developers to write and maintain code. This feature is also designed to work seamlessly with Snowflake's existing SQL syntax and data types, simplifying integration with existing tools and applications. Furthermore, vectorized UDFs are highly scalable, making it easy to process large datasets quickly and efficiently.

In this case, this feature is quite helpful because we can create a function that only needs to be called once to make predictions for all client reviews. This reduces the amount of inference time compared to calling the function for each review individually. Thus, we are going to deploy our Hugging Face models as vectorized UDFs.

Deploying the model to Snowflake

The first step in this process is to create a Snowflake stage to store the vectorized UDF. Let's call this stage ZERO_SHOT_CLASSIFICATION.

CREATE STAGE IF NOT EXISTS {your db}.{your schema}.ZERO_SHOT_CLASSIFICATION;

When calling the transformer library for a specific model for the first time, the model is downloaded and stored locally. However, Snowflake's UDFs do not allow external URLs to be called within the function due to security constraints, making this process incompatible.

Thankfully, the library permits loading the model from a local path, so to overcome the external access issue, we will compress the model and upload it to the internal stage we previously created. When predicting, we’ll load the model from the internal stage, cache the result, and use the model. This enables us to utilize the same libraries and minimize the response time of the prediction.

To compress the model, we can use the Python model joblib as shown next.

import joblib
joblib.dump(classifier, 'bart-large-mnli.joblib')

Next, we'll utilize Snowpark to transfer the model to the internal stage. To accomplish this, we'll establish a Snowpark session and utilize the put command to upload the model to the stage. It's worth noting that we won't be compressing the file within the stage since it's already compressed.

from snowflake.snowpark import Session
session = Session.builder.configs({your connection parameters}).create()
session.file.put(
   'bart-large-mnli.joblib',
   stage_location = f'@{your db}.{your schema}.ZERO_SHOT_CLASSIFICATION',
   overwrite=True,
   auto_compress=False
)

When we need to use the model, we must first decompress it, which can be time-consuming. To speed up this process, we can utilize the cachetools Python library. This library stores the outcomes of function calls in a cache and retrieves them when the same inputs are used again. We can create a function using this library in the following way:

# Caching the model
import cachetools
Import sys
@cachetools.cached(cache={})
def read_model():
   import_dir = sys._xoptions.get("snowflake_import_directory")
   if import_dir:
       # Load the model
       return joblib.load(f'{import_dir}/bart-large-mnli.joblib'

Now, we can write the vectorized UDF function to use the model

from snowflake.snowpark.functions import pandas_udf
from snowflake.snowpark.types import StringType, PandasSeriesType
@pandas_udf(  
       name='{your db}.{your schema}.get_review_classification',
       session=session,
       is_permanent=True,
       replace=True,
       imports=[
           f'@{your db}.{your schema}.ZERO_SHOT_CLASSIFICATION/bart-large-mnli.joblib'
       ],
       input_types=[PandasSeriesType(StringType())],
       return_type=PandasSeriesType(StringType()),
       stage_location='@{your db}.{your schema}.ZERO_SHOT_CLASSIFICATION',
       packages=['cachetools==4.2.2', 'transformers==4.14.1']
   )
def get_review_classification(sentences: pd.Series) -> pd.Series:
   # Classify using the available categories
   candidate_labels = ['customer support', 'product experience', 'account issues']
   classifier = read_model()

    # Apply the model
   predictions = []
   for sentence in sentences:
       result = classifier(sentence, candidate_labels)
       if 'scores' in result and 'labels' in result:
           category_idx = pd.Series(result['scores']).idxmax()
           predictions.append(result['labels'][category_idx])
       else:
           predictions.append(None)
   return pd.Series(predictions)

Some important details about the function:

Even if the function and model are in the same stage, it’s necessary to import the model-compressed file into the function explicitly.
The input and return types are mandatory for Snowflake vectorized UDFs.
When using a function, it is important to list all the packages that were used along with their respective versions. This is a crucial practice to ensure clarity and organization within the code. It’s worth noting that Anaconda already manages the declaration of dependencies for Snowflake users. Therefore, there's no need for a pre-step to download and upload these or their transitive dependencies before UDF creation.
In this case, we are calling the prediction function once per every record. There are some models that allow you to predict in batches, which improves the execution process.

That's all! The function is now fully functional and can be easily called using SQL directly. Finally, let's create some data examples to test the functionality.

WITH cs AS (
   SELECT value AS customer_review
   FROM (
   VALUES
       ('Nobody was able to solve my issues with the system'),
       ('The interface gets frozen very often'),
       ('I was charged two in the same period’)
   ) AS t(value)
)
SELECT
   cs.customer_review,
  {your database}.{your schema}.get_review_classification(
       customer_review::VARCHAR
   ) AS category_prediction
FROM cs;

Important considerations

To ensure compatibility and avoid potential issues when using Snowflake's UDFs, it's highly recommended to explicitly specify package versions when creating local conda environments and during UDF registration. By doing so, you can ensure that the same library version is installed in your local environment as the one utilized in the UDF's Python environment. This best practice can save you valuable time and effort in troubleshooting any compatibility issues that may arise. You can consult Snowflake's available libraries versions by executing the following SQL code.

SELECT * FROM {your db}.information_schema.packages
WHERE (
   package_name LIKE '%cachetools%' OR
   package_name LIKE '%transformers%'
)
AND LANGUAGE = 'python';

By default, the transformer library uses PyTorch. However, if you prefer to use Tensorflow, there may be issues with the joblib serialization library. To overcome this, load the entire model folder instead of just the {model}.joblib file and import all files inside the UDF.
While working on this article, I found that the library sentence_transformers was not accessible in the Snowflake Python environment. However, similar functionality can be achieved by utilizing the standard Transformer library.
When creating your first model version, the deployment procedure explained above will be sufficient. Once you are ready to roll out to production and operationalize, you can upload the pre-trained model's archive just before UDF registration, which only needs to be done once. For future runs, an upload is only necessary if the model archive has changed.

Conclusion

In summary, open-source LLMs offer an impressive array of features that make them a valuable asset in diverse natural language processing tasks and a compelling alternative to proprietary models like ChatGPT. Furthermore, with platforms such as Hugging Face providing accessible and comprehensive libraries, the integration and utilization of these models have become significantly more straightforward, even in complex environments like Snowflake.

The ability to deploy these models directly in Snowflake unlocks a new realm of possibilities for real-time analysis, scalability, and advanced security features, without the need for data movement. Whether you're a data scientist, a developer, or a business leader, understanding and leveraging these open-source LLMs can open up new avenues for data-driven decision-making, improved efficiency, and potential cost savings.

I'm Fabian Hernandez, Data Architect at Infostrux Solutions. Thanks for reading my blog post. You can follow me on LinkedIn and subscribe to Infostrux Medium Blogs for the most interesting Data Engineering and Snowflake news. Please let us know your thoughts about this approach in the comment section.