Integrating Generative AI with Snowflake’s External Functions

I love watches that use natural language to tell time!

UPDATE as of JULY 2023: I’ve created a new step-by-step guide with updated code that uses latest gpt4 model along with the new Chat Completions API. The guide also includes code for running interactive Streamlit application for entering prompts for Sentiment Analysis, Share Knowledge, or Text-to-SQL and displaying the responses.

So like many developers, I have been tinkering with OpenAI and Large Language Models (LLMs) over the last couple of weekends. Here is one of the projects I have been working on that uses External Functions in Snowflake.

Here’s a quick demo of calling the external function to invoke OpenAI API from Snowflake.

Using the steps outlined below for two different implementations, you should be able to replicate this in your own environment and may also integrate with other LLMs in Snowflake.

Implementation 1: AWS API Gateway

This approach was inspired by an excellent post from Fabian Hernandez. In this implementation, I’ve used AWS API Gateway as a proxy for calling OpenAI API from Snowflake. Beyond setting up the AWS environment using CloudFormation Template (CFT), what I have provided below is the CFT you can use/download and have also demonstrated how to create an external function in Snowflake that uses the AWS API Gateway to directly call and interact with OpenAI API.

Implementation 2: AWS Lambda

In this implementation, I’ve wrapped the OpenAI API in a AWS Lambda function which can then be accessed via external function in Snowflake.

Differences and Similarities

Here are some key differences and similarities between the two implementations.

  1. Using the AWS API Gateway as a proxy (without a Lambda function) is the simplest and quickest way to call public APIs from Snowflake; not just OpenAI API as highlighted in this blog.
  2. The use of Lambda function enables the creation of an abstraction layer with the ability to embed custom logic/code. For example, you could wrap multiple APIs and use the same external function in Snowflake to call them based on a parameter passed into the Lambda function.
  3. In both cases, you can call the external function for use cases like generating SQL based on natural language, performing sentiment analysis on Tweets, or any other LLM-powered use case, directly from Snowflake. IMP: Depending on how you construct the prompts, your data could be leaving Snowflake and beyond its security and governance perimeter so please be mindful of that.
  4. The other thing to consider is Amazon API Gateway pricing and AWS Lambda pricing.

Let’s build…

Prerequisites

Implementation 1: AWS API Gateway

Step 1

Download CloudFormation Template (CFT).

Step 2

Search for openAIAPIKey in the template and replace “<Your_OPENAI_API_KEY>” with your OpenAI API Key.

Step 3

In your AWS Management Console, create a CloudFormation stack using the CFT you downloaded and updated in steps 1 and 2.

Step 4

Once the CloudFormation stack has been successfully created, first create an API Integration object in Snowflake and then set up a trust relationship between Snowflake and IAM (Identity and Access Management) role openAIAPIGatewayRole created using the CFT.

Sample API Integration SQL:

CREATE API INTEGRATION AWS_OPENAI_INTEGRATION
API_PROVIDER = 'AWS_API_GATEWAY'
API_AWS_ROLE_ARN = 'arn:aws:iam::xxxxxxxxxx:role/openAIAPIGatewayRole'
API_ALLOWED_PREFIXES = ('https://xxxxxxxxxx.execute-api.us-west-1.amazonaws.com/main/v1/completions')
ENABLED = true;

Sample Policy Document for setting up trust relationship:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:sts::xxxxxxxxxx:assumed-role/openAIAPIGatewayRole/snowflake"
},
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:us-west-1:xxxxxxxxxx:1dhzbxau2m/*"
}
]
}

Step 5

Create request and response translators for the OpenAI API as shown below. (Thanks Fabian Hernandez for the code snippets! :))

CREATE OR REPLACE FUNCTION OPENAI_COMPLETIONS_REQUEST_TRANSLATOR("EVENT" OBJECT)
RETURNS OBJECT
LANGUAGE JAVASCRIPT
AS '
return {"body": {
"model": "text-davinci-003",
"prompt": EVENT.body.data[0][1],
"temperature": 0,
"max_tokens": 100,
"top_p": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0
}
};';
CREATE OR REPLACE FUNCTION OPENAI_COMPLETIONS_RESPONSE_TRANSLATOR("EVENT" OBJECT)
RETURNS OBJECT
LANGUAGE JAVASCRIPT
AS '
let array_of_rows_to_return = [[0, EVENT.body]];
return {"body": {"data": array_of_rows_to_return}};
';

Learn more about OpenAI’s Completions API.

Step 6

The final step is to create the external function using the API integration and request/response translator objects created in previous steps.

Sample SQL to create the external function:

CREATE OR REPLACE EXTERNAL FUNCTION OPENAI_EXT_FUNC("QUESTION" VARCHAR(16777216))
RETURNS VARIANT
API_INTEGRATION = "AWS_OPENAI_INTEGRATION"
MAX_BATCH_ROWS = 100
REQUEST_TRANSLATOR = OPENAI_COMPLETIONS_REQUEST_TRANSLATOR
RESPONSE_TRANSLATOR = OPENAI_COMPLETIONS_RESPONSE_TRANSLATOR
AS 'https://xxxxxxxxxx.execute-api.us-west-1.amazonaws.com/main/v1/completions';

Note: Depending on your current user role in Snowflake, you may need to run the following statement to grant usage of this function to the role.

GRANT USAGE ON FUNCTION OPENAI_EXT_FUNC(VARCHAR) TO <ROLE_NAME>;

Call External Function in Snowflake

Assuming everything has gone well so far, calling the external function to invoke OpenAI API via API Gateway is as simple as follows:

Example 1

select OPENAI_EXT_FUNC('Classify this sentiment: OpenAI is Awesome!')::VARIANT:choices[0]:text as response;

Sample response: “Positive”

Example 2

select OPENAI_EXT_FUNC('Create a SQL statement to find all users who live in California and have over 1000 credits')::VARIANT:choices[0]:text as response;

Sample response:

SELECT * FROM users WHERE state = 'California' AND credits > 1000;

Example 3

select OPENAI_EXT_FUNC('### Snowflake SQL tables, with their properties:### CAMPAIGN_SPEND table CAMPAIGN VARCHAR(60),CHANNEL VARCHAR(60),DATE DATE,TOTAL_CLICKS NUMBER(38,0),TOTAL_COST NUMBER(38,0),ADS_SERVED NUMBER(38,0) ### A query to the data so we can see total cost per year/month per channel')::VARIANT:choices[0]:text as response;

Sample Response:

SELECT 
YEAR(DATE) AS YEAR,
MONTH(DATE) AS MONTH,
CHANNEL,
SUM(TOTAL_COST) AS TOTAL_COST
FROM CAMPAIGN_SPEND
GROUP BY YEAR(DATE), MONTH(DATE), CHANNEL
ORDER BY YEAR, MONTH, CHANNEL;

Implementation 2: AWS Lambda Function

For this implementation, you can leverage some of the existing objects you may have already created in the previous sections, but assuming you want to keep the two implementations and environments separate, here are the steps that will get you up and running.

First I’d recommend that you set up the environment to create and test a simple external function using Lambda on AWS. For that, follow these very detailed instructions.

After you have successfully created and tested your external function with Lambda on AWS as per the instructions outlined above, follow steps below to wrap OpenAI API in your AWS Lambda function.

Step 1

Clone this GitHub repo.

Step 2

  • On your local machine where you have cloned the repo, browse to the folder openai_llm_lambda_function_wrapper. NOTE: This folder contains OpenAI Python Library installed via pip and the custom Lambda function code that wraps OpenAI API.
  • Create a .zip of openai_llm_lambda_function_wrapper folder on your local machine. NOTE: Make sure you create the .zip of the contents of the folder.

Step 3

Browse to your Lambda Function in AWS Management Console and upload the .zip file as shown below.

Upload Lambda Function

Step 4

Click on Configuration > Environment variables and add a new Key called open_ai_key and add your OpenAI API Key as Value as shown below.

OpenAI API key environment variable

Note: This open_ai_key environment variable is referenced in the Lambda function as shown below.

OpenAI API key environment variable

Load Sample Prompts

Execute the following statements in Snowsight to create a sample table and insert some example prompts.

CREATE or REPLACE TABLE llm_prompts (
prompt variant
);

INSERT INTO llm_prompts
select parse_json(column1) as prompt
from values
('{"type":"code_complete","prompt":"Create a SQL statement to find all users who live in California and have over 1000 credits"}');

Call External Function in Snowflake

Execute the following SQL statement to call the external function for each prompt in the above table. Note: In the following SQL statements, replace llm_external_function with the name of the function that you created in Create the External Function for AWS in Snowflake step.

SELECT prompt,llm_external_function(prompt:prompt) as llm_query 
from llm_prompts;

If all goes well, you should see the following output.

Verbose prompt

Now let’s assume you have a Snowflake table named CAMPAIGN_SPEND with the following columns:

CAMPAIGN VARCHAR(60)
CHANNEL VARCHAR(60)
DATE DATE
TOTAL_CLICKS NUMBER(38,0)
TOTAL_COST NUMBER(38,0)
ADS_SERVED NUMBER(38,0)

In that case, you can provide a verbose prompt like so…

"### Snowflake SQL table with their properties:
### CAMPAIGN_SPEND table CAMPAIGN VARCHAR(60),CHANNEL VARCHAR(60),DATE DATE,
TOTAL_CLICKS NUMBER(38,0),TOTAL_COST NUMBER(38,0),ADS_SERVED NUMBER(38,0)
### A query to the data so we can see total cost per year/month per channel"

And you should see a response similar to this…

SELECT 
YEAR(DATE) AS YEAR,
MONTH(DATE) AS MONTH,
CHANNEL,
SUM(TOTAL_COST) AS TOTAL_COST
FROM CAMPAIGN_SPEND
GROUP BY YEAR(DATE), MONTH(DATE), CHANNEL
ORDER BY YEAR, MONTH, CHANNEL;

That’s it for now…

Thanks for your time! Hope you found this blog educational and inspiring. While this is an example of quickly getting up and running with OpenAI and LLMs in Snowflake using External Functions, to see many more cool demos, come see us at Snowflake Summit in Vegas.

Connect with me on Twitter and LinkedIn where I share demos, code snippets, QuickStart Guides, and other interesting technical artifacts. Be sure to also check out Snowflake For Developers.

UPDATE as of JULY 2023: I’ve created a new step-by-step guide with updated code that uses latest gpt4 model along with the new Chat Completions API. The guide also includes code for running interactive Streamlit application for entering prompts for Sentiment Analysis, Share Knowledge, or Text-to-SQL and displaying the responses.

--

--

Dash Desai
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Lead Developer Advocate @ Snowflake | AWS Machine Learning Specialty | #DataScience | #ML | #CloudComputing | #Photog