Automating LLM deployment in Snowflake

Learn how to use the Snowpark ML Model Registry library, and Snowflake vectorized Python UDFs to automate the deployment of HuggingFace Large Language Models (LLMS) for Snowflake SQL users.

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readSep 19, 2023

Disclaimer: This article was created using a private preview of the model registry. Please consult this documentation to migrate from this version to the one that was publicly previewed.

In our previous two articles, we demonstrated the deployment of LLM through Snowflake vectorized UDF and the management of LLM using the Snowpark ML Model Registry libraries. This article will merge these approaches to present a comprehensive solution using Snowflake technologies for your organization's fully managed LLM solution needs.

But why should I care about it?

Creating a proof of concept (POC) model is now easier than ever with the latest LLM models and libraries. You can get an idea of how quickly a domain-specific model can be built by checking out the latest DeepLearning.ai course in Generative AI with LLMs (Highly recommended!).

However, achieving a working POC in your own notebook is just one component of developing a complete Generative AI application, as illustrated in the image below. When building a generative AI application, important decisions must be made regarding application interfaces, LLM frameworks, informational sources, monitoring/feedback, and infrastructure.

Components of a Generative AI application. Image provided by the DeepLearning.ai course on Generative AI with LLMs with modifications by the blog post author.

In this article, we will focus on dealing with specific parts of the infrastructure component of a solution using technologies already provided by Snowflake. Specifically, we will concentrate on model versioning, model serving, and model deployment automation.

We’ll showcase the solution with the same hypothetical scenario we featured in our recent blog posts. In this particular fictional scenario, our objective is to categorize customers’ text reviews based on various aspects of our business, utilizing the pre-trained HuggingFace model Facebook/bart-large-mnli.

Model versioning

Model versioning is crucial for Generative AI applications for several reasons:

Reproducibility: Generative AI applications are iterative by nature. As you develop, fine-tune, and refine your models, it's essential to be able to reproduce previous results.
Comparison and Evaluation: As you change your model or fine-tune data, you'll want to compare the performance and outputs of different model versions.
Rollback: Sometimes, a new model version doesn't perform as expected or might introduce unintended side effects.
Collaboration: Versioning becomes even more critical when multiple teams or researchers work on the same project. It ensures that everyone understands which version they are working on and makes collaborative efforts smoother.
Documentation and Training: Model versioning, when accompanied by proper metadata, provides insight into the evolution of the model's design, its performance characteristics, and any associated assumptions or biases.

Snowflake already provides a set of libraries to add these benefits to your Generative AI application.

For a thorough introduction to the Snowpark ML Model Registry functionality, we highly recommend checking out Eylon Steiner’s blog post. It provides a comprehensive overview and valuable insights on the topic.

A new model version can be registered by referencing an already-created registry, creating a model registry custom class, and logging the model, as shown below.

import os
from snowflake.ml.registry import model_registry
from snowflake.ml.model import custom_model
from snowflake.ml.model import model_signature
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import pandas as pd

# 1. Reference an already-created registry.
registry = model_registry.ModelRegistry(
   session=session,
   database_name="<your_database_name>",
   schema_name='MODEL_REGISTRY'
)

# Loading the model and the tokenizer
model = AutoModelForSequenceClassification.from_pretrained('Facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('Facebook/bart-large-mnli')

# Save the model locally
ARTIFACTS_DIR = "/tmp/facebook-bart-large-mnli/"
os.makedirs(os.path.join(ARTIFACTS_DIR, "model"), exist_ok=True)
os.makedirs(os.path.join(ARTIFACTS_DIR, "tokenizer"), exist_ok=True)
model.save_pretrained(os.path.join(ARTIFACTS_DIR, "model"))
tokenizer.save_pretrained(os.path.join(ARTIFACTS_DIR, "tokenizer"))

# 2. Create a custom class for the model
class FacebookBartLargeMNLICustom(custom_model.CustomModel):
   def __init__(self, context: custom_model.ModelContext) -> None:
       super().__init__(context)

       self.model = AutoModelForSequenceClassification.from_pretrained(self.context.path("model"))
       self.tokenizer = AutoTokenizer.from_pretrained(self.context.path("tokenizer"))
       self.candidate_labels = ['customer support', 'product experience', 'account issues']

   @custom_model.inference_api
   def predict(self, X: pd.DataFrame) -> pd.DataFrame:
       def _generate(input_text: str) -> str:
           classifier = pipeline(
               "zero-shot-classification",
               model=self.model,
               tokenizer=self.tokenizer
           )

           result = classifier(input_text, self.candidate_labels)
           if 'scores' in result and 'labels' in result:
               category_idx = pd.Series(result['scores']).idxmax()
               return result['labels'][category_idx]

           return None

       res_df = pd.DataFrame({"output": pd.Series.apply(X["input"], _generate)})
       return res_df

# 3. Logging the model
model = FacebookBartLargeMNLICustom(custom_model.ModelContext(models={}, artifacts={
   "model":os.path.join(ARTIFACTS_DIR, "model"),
   "tokenizer":os.path.join(ARTIFACTS_DIR, "tokenizer")
}))

model_id = registry.log_model(
   model_name='Facebook/bart-large-mnli', # Set the model name
   model_version='100', # Set the model version
   model=model,
   conda_dependencies=[
       "transformers==4.18.0"
   ],
   tags={
       'deploy':'1' # Tag the model for deployment
   },
   signatures={
       "predict": model_signature.ModelSignature(
           inputs=[model_signature.FeatureSpec(name="input", dtype=model_signature.DataType.STRING)],
           outputs=[model_signature.FeatureSpec(name="output", dtype=model_signature.DataType.STRING)],
       )
   }
)

The code above refers to the session object that represents a Snowpark session python object. Here you can find instructions on how to create it. For more information about the Snowpark ML Model Registry library please consult the official documentation.

Once completed, the model can be loaded onto a local device by specifying the name and desired version.

model_reference = model_registry.ModelReference(
 registry=registry,
 model_name="Facebook/bart-large-mnli",
 model_version="100"
)

model = model_reference.load_model()
model.predict(pd.DataFrame({"input":["The interface gets frozen very often"]}))

Model Serving

Another essential component related to the infrastructure component of your Generative AI application is how to provide access to the model. Multiple methods exist to accomplish this task in the context of Data Analysis. However, providing your model as a User Defined Function (UDF) allows SQL users to access LLM models easily.

In another of our recent articles, we demonstrated how SQL users can easily access LLM models through Snowflake Python Vectorized UDFs by implementing the following code (This code assumes that the classifier object was uploaded to a stage and the stage reference is provided to the UDF):

# Caching the model
import cachetools
import sys

@cachetools.cached(cache={})
def read_model():
  import_dir = sys._xoptions.get("snowflake_import_directory")
  if import_dir:
      # Load the model
      return joblib.load(f'{import_dir}/bart-large-mnli.joblib'

# Create the vectorized UDF where to store the model
from snowflake.snowpark.functions import pandas_udf
from snowflake.snowpark.types import StringType, PandasSeriesType
@pandas_udf( 
      name='{your db}.{your schema}.get_review_classification',
      session=session,
      is_permanent=True,
      replace=True,
      imports=[
          f'@{your db}.{your schema}.{your model stage}/bart-large-mnli.joblib'
      ],
      input_types=[PandasSeriesType(StringType())],
      return_type=PandasSeriesType(StringType()),
      stage_location='@{your db}.{your schema}.{}',
      packages=['cachetools==4.2.2', 'transformers==4.14.1']
  )
def get_review_classification(sentences: pd.Series) -> pd.Series:
  # Classify using the available categories
  candidate_labels = ['customer support', 'product experience', 'account issues']
  classifier = read_model()
 
  # Make the inferance
  predictions = []
  for sentence in sentences:
      result = classifier(sentence, candidate_labels)
      if 'scores' in result and 'labels' in result:
          category_idx = pd.Series(result['scores']).idxmax()
          predictions.append(result['labels'][category_idx])
      else:
          predictions.append(None)
  return pd.Series(predictions)

This approach enables us to access the LLM by simply calling the UDF within any SQL statement.

WITH cs AS (
  SELECT value AS customer_review
  FROM (
  VALUES
      ('Nobody was able to solve my issues with the system'),
      ('The interface gets frozen very often'),
      ('I was charged two in the same period')
  ) AS t(value)
)
SELECT
  cs.customer_review,
 {your database}.{your schema}.get_review_classification(
      customer_review::VARCHAR
  ) AS category_prediction
FROM cs;

Model Deployment automation

Finally, we can use both methods to develop a process for automating LLM model version deployment without negatively impacting our SQL consumers' experience. Imagine a Generative AI project life-cycle in Snowflake like this:

Your Data Science team creates a Generative AI POC.
After the first model version has been approved internally, the version is registered to the official project registry using the Snowpark ML Model Registry.
Using Snowflake Python vectorized UDF, the LLM is provided internally in your company SQL users. But instead of referring to a specific model version, we create a stored procedure that retrieves the most recent version from the project registry and re-creates the Snowflake UDF.
Whenever a fresh version is created and added to the project’s registry, we execute the deployment procedure to ensure the UDF content is updated.

This process could ensure that SQL consumers always have access to the latest model version without realizing it!

You can use the code below to build this store procedure and deploy a new version using the Snowpark ML Model Registry libraries.

CREATE OR REPLACE PROCEDURE {your database}.{your schema}.DEPLOY_REVIEW_CLASSIFICATION()
 RETURNS STRING
 LANGUAGE PYTHON
 RUNTIME_VERSION = '3.8'
 PACKAGES = (
   'snowflake-snowpark-python',
   'snowflake-ml-python',
   'transformers==4.18.0'
 )
 HANDLER = 'deploy'
 EXECUTE AS CALLER
AS
$$

from snowflake.ml.registry import model_registry
from snowflake.snowpark.types import PandasDataFrame, StringType, PandasSeriesType, PandasSeries
from snowflake.snowpark.functions import col, udf, parse_json, lit
import numpy as np
import pandas as pd

def deploy(session):
   # 1. Get a reference to the model registry
   registry = model_registry.ModelRegistry(
       session=session,
       database_name="<your database>",
       schema_name='MODEL_REGISTRY'
   )

   # 2. Read the information about the model to be deployed.
   model_name, model_version, err = get_model_to_deploy(registry)
   if err is not None:
       return err

   # 3. Get a reference to the model
   reference = model_registry.ModelReference(
       registry=registry,
       model_name=model_name,
       model_version=model_version)
  
   model = reference.load_model()

   # 4. Create the prediction function
   @udf(
       name='{your database}.{your schema}.GET_REVIEW_CLASSIFICATION',
       is_permanent = True,
       session=session,
       stage_location = '@"<your database>"."<your schema>".REVIEW_CLASSIFICATION',
       replace=True,
       input_types=[PandasSeriesType(StringType())],
       return_type=PandasSeriesType(StringType()),
       packages=[
           'snowflake-snowpark-python',
           'snowflake-ml-python',
           'transformers==4.18.0'
       ]
   )
   def get_review_classification(sentences: pd.Series) -> pd.Series:
       # Make the inferance
       predictions = []
       for sentence in sentences:
           result = model.predict(pd.DataFrame({"input":[sentence]}))
           predictions.append(result['output'][0])
       return pd.Series(predictions)

   return f'Model {model_name} successfully deployed'

def get_model_to_deploy(registry):
   try:
      # Get the model to deploy
       model_list = registry.list_models()
       model_to_deploy = (
         model_list
         .filter(parse_json(model_list["TAGS"]).getField("deploy") == lit('1'))
         .select(
           col("NAME"),
           col("VERSION"))
         .to_pandas()
       )
      
       if len(model_to_deploy) == 0:
           None, None, "No model deployed yet"

       return  model_to_deploy["NAME"][0], model_to_deploy["VERSION"][0], None
  
   except Exception as e:
       return None, None, str(e)

$$;

Some details about the store procedure:

It is important to be careful about the version of Python, and packages used both locally and in the store procedure and UDF, as package version mismatches are a common issue when following this approach.
To determine the appropriate model to deploy, we utilize the “model” tag. However, the library also provides the capability to customize the elevation methodology through the use of tags and metrics.
We recommend using a Snowpark-Optimized warehouse for executing the deployment store procedure. Loading LLM models is a memory-intensive operation that may fail in traditional Snowflake warehouses.

Now, every time you call the store procedure, the UDF will be automatically updated to reference the latest model. This means you won't need to manually change any references to the UDF. Pretty cool, right?

Summary

This article demonstrates how to utilize the Snowpark ML Model Registry library, Snowflake stored procedures, and Snowflake pandas vectorized UDF to automate the deployment of LLM, which grants seamless and continuous access to SQL users to the models within your organization.

While we understand that this solution may not cover all aspects required for a complete Generative AI application, we are confident that it can assist you in organizing your development process for your own POC.

In our next article, we'll show how to add a fine-tuned component to this solution using Snowflake store procedures and Snowpark-optimized Warehouses. Stay tuned!

Don't forget to read the Snowpark ML team's blog post on deploying Open Source LLM Deployment in Snowflake. They provide detailed information on libraries and upcoming Snowpark container features.

Lastly, I would like to express my gratitude to the Snowpark ML team and the folks from Streamlit for assisting me in curating the content of this article and giving us early access to the library preview. It is always great to witness how the LLM community in Snowflake is expanding and gaining more traction.

I’m Fabian Hernandez, Data Architect at Infostrux Solutions. Thanks for reading my blog post. You can follow me on LinkedIn and subscribe to Infostrux Medium Blogs for the most interesting Data Engineering and Snowflake news. Please let us know your thoughts about this approach in the comment section.