Frosty: Build an LLM Chatbot in Streamlit on your Snowflake Data

Written by Joshua Carroll and Caroline Frasca

Snowflake is already the best place to unlock seamless data collaboration. Recent developments will enable Snowflake users to supercharge their data with the power of LLMs, generative AI, and Streamlit.

Today we’re excited to share an example using the new, native Streamlit chat interface — included in version 1.24.0 of the open-source Python library.

We’ll show how you can:

  • Build a robust, interactive LLM-powered chat app with less than 50 lines of code
  • Use prompt engineering to generate SQL from natural language
  • Explore your private Snowflake or Snowflake Marketplace data by connecting Streamlit to Snowpark

Why a chat interface?

Let’s start with an example business scenario.

You’re an analyst working at BankCo and you’ve been asked to analyze financial data and economic indicators to make recommendations for investment strategies. You might need to answer questions such as “Which states had the most bank merger and acquisition events over the past five years?” or “What parts of the country are most impacted by inflation?”

Each of these questions requires a complex SQL query and the creation of visualizations to share the results. As is often the case, stakeholders may request changes to the analysis or have further questions that need exploration. Each question can lead to more questions, requiring additional queries, code, and Slack messages. This familiar and frustrating cycle presents an opportunity to leverage LLMs.

This is where the magic happens.

By connecting our chatbot to an LLM such as GPT-3.5 and our data stored in Snowflake, we can:

  • Harness the power of LLMs to answer user questions phrased in natural human language with a reply that includes a SQL query
  • Use Snowpark for Python to execute the SQL queries generated by the LLM on data stored in Snowflake
  • Render those query results natively in our chatbot using Streamlit

Fundamentally, we have empowered our chatbot to answer questions about the data stored in Snowflake!

This eliminates the barrier preventing non-technical business users from getting valuable insights from Snowflake data. Now, anyone capable of asking a question can interact with and learn from their data, without writing a single SQL query.

In many ways, this interactive, SQL-savvy chatbot outperforms the traditional dashboard. Instead of tediously tweaking dashboard architecture and logic to address a business user’s one-off questions, that time can be spent curating a chatbot capable of answering a limitless number of questions.

Frosty: build an LLM chatbot in Streamlit on your Snowflake data

In this tutorial, we’ll build a chatbot named Frosty.

Frosty performs data exploration and answers questions by writing and executing SQL queries on Snowflake data. The application uses Streamlit and Snowflake and can be integrated with your LLM. You can also use data from Snowflake Marketplace.

This is a high-level overview of the app’s main concepts and architecture. If you want to dig into the code in more detail, please check out the complete Snowflake Quickstart.

Diagram demonstrating the architecture of the app

Main concepts

Snowflake Marketplace

The Snowflake Marketplace provides users with access to a wide range of datasets from third-party data stewards, expanding the data available for transforming business processes and making decisions. Data providers can publish datasets and offer data analytics services to Snowflake customers. Customers can securely access shared datasets directly from their Snowflake accounts and receive automatic real-time updates.

This Quickstart uses the Cybersyn Financial & Economic Essentials dataset, which is free and available instantly from the Snowflake Marketplace.

Snowpark Python

Snowpark provides an intuitive API for querying and processing data obtained from the Snowflake Data Marketplace, which is stored in Snowflake. With Snowpark Python, developers can avoid writing verbose SQL and instead use Snowpark’s client-side API, similar to PySpark, to directly interact with their data.

The Quickstart uses Snowpark Python (via Streamlit’s st.experimental_connection feature) to execute SQL queries generated by GPT-3.5 on the Cybersyn Financial & Economic Essentials dataset.

st.experimental_connection

Demo app for Streamlit’s st.experimental_connection feature

st.experimental_connection allows developers to connect their Streamlit apps to a data store or API in just four lines of code. This feature simplifies the connection process, which typically requires finding and installing external packages, securely managing and accessing credentials, determining the appropriate methods to access data in the desired format, and figuring out the best way to cache all of that work. Instead, developers can use st.experimental_connection to avoid navigating those complexities.

The Quickstart uses st.experimental_connection to connect to Snowflake via Snowpark Python.

Streamlit’s new chat UI

Streamlit recently released a new way to build a chat interface in version 1.24.0 of its open-source Python library. This feature enables developers to take user input through st.chat_input and display messages in the UI using st.chat_message. Streamlit's session state can be used to store chat history.

The Quickstart combines the chat interface and session state to create a chatbot capable of answering questions about data stored in Snowflake.

Prompt engineering

Our app includes a generated system prompt that orients the LLM, provides information on the table being queried, and specifies rules to ensure the output includes a SQL query that the app can extract and run. On startup, the app runs a few metadata queries to get the table context, which is then inserted into the system prompt.

Here is the key portion of the prompt we’re using (view the full prompt here):

You will be acting as an AI Snowflake SQL expert named Frosty. Your goal is to give correct, executable SQL queries to users. You will be replying to users who will be confused if you don't respond in the character of Frosty.

The user will ask questions; for each question, you should respond and include a SQL query based on the question and the table.

<table_context>
{table_context}
</table_context>

Here are 6 critical rules for the interaction you must abide:
<rules>
1. You MUST MUST wrap the generated SQL queries within ```sql code markdown
2. If I don't tell you to find a limited set of results in the sql query or question, you MUST limit the number of responses to 10.
3. Text / string where clauses must be fuzzy match e.g ilike %keyword%
4. Make sure to generate a single Snowflake SQL code snippet, not multiple.
5. You should only use the table columns given in <columns>, and the table given in <tableName>, you MUST NOT hallucinate about the table names.
6. DO NOT put numerical at the very front of SQL variable.
</rules>

Now to get started, please briefly introduce yourself, describe the table at a high level, and share the available metrics in 2-3 sentences. Then provide 3 example questions using bullet points.

The chatbot app

frosty_app.py is the main Python file that contains the Streamlit app code for receiving input prompts from the user, storing and rendering chat history, calling the LLM, and parsing out and executing a SQL query from the response. The initial system prompt is imported from the prompt engineering file described above.

import openai
import re
import streamlit as st
from prompts import get_system_prompt

st.title("☃️ Frosty")

# Initialize the chat messages history
openai.api_key = st.secrets.OPENAI_API_KEY
if "messages" not in st.session_state:
# system prompt includes table information, rules, and prompts the LLM to produce
# a welcome message to the user.
st.session_state.messages = [{"role": "system", "content": get_system_prompt()}]

# Prompt for user input and save
if prompt := st.chat_input():
st.session_state.messages.append({"role": "user", "content": prompt})

# display the existing chat messages
for message in st.session_state.messages:
if message["role"] == "system":
continue
with st.chat_message(message["role"]):
st.write(message["content"])
if "results" in message:
st.dataframe(message["results"])

# If last message is not from assistant, we need to generate a new response
if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
response = ""
resp_container = st.empty()
for delta in openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": m["role"], "content": m["content"]} for m in st.session_state.messages],
stream=True,
):
response += delta.choices[0].delta.get("content", "")
resp_container.markdown(response)

message = {"role": "assistant", "content": response}
# Parse the response for a SQL query and execute if available
sql_match = re.search(r"```sql\n(.*)\n```", response, re.DOTALL)
if sql_match:
sql = sql_match.group(1)
conn = st.experimental_connection("snowpark")
message["results"] = conn.query(sql)
st.dataframe(message["results"])
st.session_state.messages.append(message)

Where to go from here

With a little bit of prompt engineering, you can build an LLM-powered chat app using the tools built into Snowflake and Streamlit. But this is just the start of what chat interfaces can do for data exploration and question-answering.

Here are a few ideas for extending your app:

  • Update the app to run against your private data in Snowflake, or other relevant Snowflake Marketplace datasets. The table-specific logic in the app is specified at the top of prompts.py, so it’s easy to swap and start experimenting.
  • Add more capabilities to the app, such as using the LLM to choose from a set of available tables, summarizing the returned data, and writing Streamlit code to visualize the results. You could even use a library like LangChain to convert Frosty into an “Agent” with improved chain-of-thought reasoning and dynamic error handling.
  • Prepare to run the app in Streamlit in Snowflake (currently in Private Preview): The functionality shown here will soon be available in Streamlit in Snowflake, especially when paired with External Access (also in Private Preview) to simplify access to an external LLM.

Check out the Frosty session (ML103) from Snowflake Summit 2023 for more ideas and to learn about what’s coming soon from Snowflake!

--

--