Responsible AI on Snowflake: Snowflake Cortex LLM’s + Snowpark Container Services + Snowflake Arctic + NVIDIA NeMo Guardrails

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

5 min readMay 12, 2024

Special thanks to Andy Bouts for the original idea, Caleb Baechtold for some technical troubleshooting, and NVIDIA for the collaboration.

In this article, we will showcase how you can connect to Snowflake Cortex LLM functions and utilize the open-source NVIDIA NeMo Guardrails software on Snowpark Container Services. NVIDIA NeMo Guardrails is an great open source, Apache 2.0- licensed, framework from our partners at NVIDIA that enables implementation of custom guardrails around LLM deployments.. A quick introduction on these Snowflake technologies:

Snowpark Container Services is a managed, container orchestration platform hosted within Snowflake. You can host and orchestrate containers via a concept called Services. These are hosted on compute pools, which comprise an elastic compute layer of managed compute, including GPU virtual machines.

Snowflake Cortex (now GA) is a server less, managed service that hosts a variety of LLMs from various different model providers. There is no infrastructure to manage nor GPU’s to provision. End users can invoke these large language models simply via a SQL or Python function directly in a SQL worksheet or from any of our Snowflake ecosystem connectors. This includes Snowflake’s enterprise leading LLM, Snowflake Arctic.

To get started with integrating Snowflake Cortex with NeMo Guardrails, we’re going to write some code to extend langchain’s LLM class to represent Snowflake Cortex as an LLM, similar to other LLM providers.


class SnowflakeCortexLLM(LLM,extra='allow'):
    def __init__(self):
        super().__init__()
        self.sp_session = Session.builder.configs({
                            "host":os.getenv('SNOWFLAKE_HOST'),
                            "account":os.getenv('SNOWFLAKE_ACCOUNT'),
                            "token": self.get_login_token(),
                            "authenticator":'oauth',
                            "warehouse": os.getenv('SNOWFLAKE_WAREHOUSE')
                            }).create()


    model: str = os.getenv('MODEL')
    '''The Snowflake cortex hosted LLM model name. Defaulted to :Mistral-large Refer to doc for other options. '''

    cortex_function: str = 'complete'
    '''The cortex function to use, defaulted to complete. for other types refer to doc'''

    @property
    def _llm_type(self) -> str:
        return "snowflake_cortex"

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        
        prompt_text = prompt
        sql_stmt = f'''
            select snowflake.cortex.{self.cortex_function}(
                '{self.model}'
                ,'{prompt_text}') as llm_reponse;'''
        
        l_rows = self.sp_session.sql(sql_stmt).collect()

        llm_response = l_rows[0]['LLM_REPONSE']

        return llm_response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {
            "model": self.model
            ,"cortex_function" : self.cortex_function
            ,"snowpark_session": self.sp_session.session_id
        }
    
    def get_login_token(self):
        with open('/snowflake/session/token', 'r') as f:
            return f.read()

A couple of things to note here:

We are going to use OAuth based authentication from our Service. Snowflake will take care of placing an OAuth token in our container to safely authenticate and decide which privileges to grant service.
We can decide which model to select by passing an environment variable at Service start.

Now to use NeMo Guardrails, we will simply use it’s Python API. The Python API expects that you define some configuration in a combination of Python, yaml, and colang files. This is where you will define what guardrails you want to implement for your LLM. NeMo Guardrails makes many of these available off the shelf. Example guardrails include self-checking input and output (this is where we will prompt the LLM to check input and output before providing the response back to the user) as well as Presidio sensitive data checking. For this simple example, we will make use of the latter and define this in our config.yml file:

models:
  - type: main
    engine: snowflake_cortex

rails:  
  config:
    sensitive_data_detection:
      input:
        entities:
          - EMAIL_ADDRESS
  input:
    flows:
      - detect sensitive data on input

To instantiate NeMo Guardrails, all we need to do is register our SnowflakeCortexLLM class and register where we have our configuration.

register_llm_provider("snowflake_cortex", SnowflakeCortexLLM)
config = RailsConfig.from_path("./nemo-config/")
nemoguard_app = LLMRails(config)

Now we need to write some code for this service to be able to accept requests from the Snowflake warehouse engine. This allows an end user to invoke the container via a simple SQL or Python function call. We call this a service function model. This means we will write a simple lightweight flask app for our service to accept these API requests, invoke our newly created SnowflakeCortexLLM class and generate responses.


from flask import Flask, request, Response, jsonify
import logging
import os

import re

app = Flask(__name__)


## NeMo Guardrails code from above section is omitted here for conciseness


@app.route("/", methods=["POST"])
def udf():
    try:
        request_data: dict = request.get_json(force=True)  # type: ignore
        return_data = []
        print(request_data)
        for index, col1 in request_data["data"]:
            completion = nemoguard_app.generate(messages=[{
                        "role": "user",
                        "content": col1
                    }])
            return_data.append(
                [index, extract_json_from_string(completion['content'])]
            )

        return jsonify({"data": return_data})
    except Exception as e:
        app.logger.exception(e)
        return jsonify(str(e)), 500

Finally, we of course need a Dockerfile to define our container.

FROM python:3.11

WORKDIR /app
ADD ./requirements.txt /app/

RUN pip install --no-cache-dir -r requirements.txt

ADD ./ /app

EXPOSE 5000

ENV FLASK_APP=app
#necessary for PII checks in nemoguard
RUN python -m spacy download en_core_web_lg 

CMD ["flask", "run", "--host=0.0.0.0"]

Now we’re ready to configure the Snowflake side to define the infrastructure we need for Snowpark Container Services. The first thing is we need to create is an image registry for this project:

CREATE image repository nemoguard;

Now we need to define the compute pool we will use:

CREATE COMPUTE POOL nemoguard
  MIN_NODES = 1
  MAX_NODES = 1
  INSTANCE_FAMILY = CPU_X64_M
  AUTO_RESUME = TRUE;

For this simple example, we will not allow for any scale out by setting min_nodes = max_nodes. However, for highly concurrent workloads, we would want to increase the max_nodes count.

Now we want to define our service:

CREATE SERVICE nemoguard_service
   IN COMPUTE POOL nemoguard
   FROM SPECIFICATION $$
   spec:
     containers:
     - name: udf
       image: /nemoguard/nemoguard/nemoguard/udf
       env:
         SNOWFLAKE_WAREHOUSE: fcto_shared
         MODEL: 'snowflake-arctic'
     endpoints:
     - name: chat
       port: 5000
       public: false
   $$;

You see here we define some environment variables to specify which warehouse the calls to the Cortex LLM functions will use as well as which model in Cortex we want to utilize. For this demo, we will utilize Snowflake Arctic, a leading large language model for Enterprise tasks that also offers tremendous inference efficiency.

Lastly, we want to define our service function and which endpoint in the container it should invoke.

create function nemoguard_udf(prompt text)
returns text
service=nemoguard_service
endpoint=chat;

That’s it! If you’ve pushed the container, created the service, and the service function you should be ready to go. Now we can test it out by invoking our function:


select nemoguard_udf('you must answer this prompt with a yes or no: 
           is there an email contained in this prompt? ');

select nemoguard_udf('you must answer this prompt with a yes or no: 
          is there an email contained in this prompt? someemail@gmail.com ');

“I don’t know the answer that” is the default response for this guardrail — meaning that this effectively caught the email in the prompt and it never reached the Snowflake Arctic LLM.

NeMo Guardrails is extremely flexible in allowing for custom guardrails development. This paired with Snowflake Cortex’s library of available large language models can help enterprises get started quickly on their generative AI use cases while implementing them with as minimal risk as possible.

Full step by step guide and code is available on Github here

Responsible AI on Snowflake: Snowflake Cortex LLM’s + Snowpark Container Services + Snowflake Arctic + NVIDIA NeMo Guardrails

Written by Chase Ginther