Snowflake AI Insights➡ ChatGPT ➡ Slack
I completed this in February (sorry it is late!).
This was super easy to do.
Writing the blog was more complex — reading it is easy, read on 👓
⚠ I have battled quite painfully with navigating Google’s documentation for gateways and cloud functions.
Ingredients
Get an OpenAI API key (you have free credit to get started)
- Knowledge to copy code!!!
- Set up a slack app (easy) with a webhook for a channel
- Be cool enough to be able to
CREATE EXTERNAL FUNCTION
in Snowflake - AWS or GCP. I used GCP and this wasn’t fun — I had help with this (thanks Eric Park!), so watch out and enjoy this helpful guide from Snowflake.
Why?
- This is fun and very cool
- This is a good way to start using AI and get familiar with Python
- Now you know I am actively working on AI products at Astrato Analytics so you don’t actually need to follow this tutorial to have AI in Snowflake 😉
Diagram
This is what is actually happening
1. Including data from Snowflake
That’s the code below, Snowflake makes it easy for you!
This provides a JSON output (maximum 16MB) of your table or view.
BEFORE running this, make sure you use a WHERE
clause to limit the number of rows — don’t try LIMIT
or TOP
— the output is always one row.
SELECT
array_agg( OBJECT_CONSTRUCT( * ))
FROM reviews /* <-- (table/view name goes here)*/
;
Make sure you don’t go over the token limit for OpenAI 4096 tokens (estimated between 6,144 and 9,216 characters when working with datasets). Token limit include: data, prompt, response. Use a small table, or use a WHERE
clause. You can check how many tokens your data uses, right here.
2. The google function
I strongly recommend using Python for this, given that I am providing the code in Python. If that isn’t a good enough reason, I don’t know what is!
Data is sent from Snowflake, to the cloud function, in the format below.
# {'data':
# [[
# 0,
# 'Reply to me: Test passed',
# 'https://app.astrato.io',
# 'Workbook name',
# 'https://hooks.slack.com/services/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'
# ]]
# }
3. Using the function
Running the function is easy, simply run the SQL below. You can replace the data with the code from step 1.
---- TEST FUNCTION
select demo_dataset.seattle_housing.OPENAI(
/*Prompt & Data */ 'generate insights based on the data: [{AVERAGE_HOUSE_PRICE:632500,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1540,ZIP_CODE:98108},{AVERAGE_HOUSE_PRICE:655000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1540,ZIP_CODE:98144},{AVERAGE_HOUSE_PRICE:485000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1670,ZIP_CODE:98118},{AVERAGE_HOUSE_PRICE:999999,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1700,ZIP_CODE:98118},{AVERAGE_HOUSE_PRICE:660000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1770,ZIP_CODE:98118},{AVERAGE_HOUSE_PRICE:936000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1800,ZIP_CODE:98118},{AVERAGE_HOUSE_PRICE:575000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1800,ZIP_CODE:98178},{AVERAGE_HOUSE_PRICE:627000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1850,ZIP_CODE:98178},{AVERAGE_HOUSE_PRICE:911000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1880,ZIP_CODE:98144},{AVERAGE_HOUSE_PRICE:697000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:1894,ZIP_CODE:98144},{AVERAGE_HOUSE_PRICE:695000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2040,ZIP_CODE:98144},{AVERAGE_HOUSE_PRICE:700000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2060,ZIP_CODE:98118},{AVERAGE_HOUSE_PRICE:800000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2080,ZIP_CODE:98108},{AVERAGE_HOUSE_PRICE:738000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2080,ZIP_CODE:98178},{AVERAGE_HOUSE_PRICE:815000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2110,ZIP_CODE:98108},{AVERAGE_HOUSE_PRICE:560000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2170,ZIP_CODE:98178},{AVERAGE_HOUSE_PRICE:1425000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2210,ZIP_CODE:98118},{AVERAGE_HOUSE_PRICE:888000,NUMBER_OF_BATHROOMS:2,NUMBER_OF_BEDROOMS:3,REGION:7,SQUARE_FEET:2350,ZIP_CODE:98178}]'
,/*Workbook URL*/ 'https://app.astrato.io'
,/*Workbook Name*/ 'Test from Snowsight'
,/*Webhook URL*/ 'https://hooks.slack.com/services/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'
);
4. Run in Snowflake ❄
Now we’re talking! We’re simply running a function, which requires a few parameters. The output provides insight into the data, with the release of the 16k context window in gtp3.5-turbo-16k, we can put larger data tables into OpenAI for analysis.
This function sends these parameters to the Google cloud function which does the rest of the work.
5. EXTRA: Run in a Data App, via Astrato
Drag and drop a button, add a few action blocks in Astrato. No code! 🙌
The actions I’m using:
- Fire on button click
- Runs the same query as the table in the gif, storing the results to a variable
- Sends that variable inside my prompt (which is also another variable)
- Calls the Google Cloud function from Snowflake with my full prompt (including data)
- Google Cloud function shares to slack