Leveraging AI agents in Snowflake for product analytics | Part 1

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readJun 14, 2024

Imagine jumping through 100 hoops just to understand how your users engage with your product. As a product manager, getting a simple answer like “What percentage of users completed our onboarding flow and made a purchase last month?” can feel like an endless odyssey.

This blog introduces Houseware’s Product Manager co-pilot agent, which helps product managers understand behavioral data about their users in natural language. Below, we cover the logic of the agent’s operation and the rationale behind its infrastructure. This agent is also a proof-of-concept to test whether an agentic workflow can be used to solve this recurring analytics problem for product managers.

Part 2 of this blog will cover the technical details and the exact code so you can try it out yourself. So read ahead to understand the problem and watch our agent in action.

Understanding the problem

Modern analytics tools like Amplitude and Mixpanel are powerful enough to remove human dependencies when writing complex SQL queries or creating the perfect visualization to analyze the data well.

However, they still demand a certain level of technical proficiency. Understanding data and how to analyze it to get to your answers rapidly and correctly is a skill that needs to be honed. They need to understand events, dimensions, and schemas and often rely on engineers to instrument the right data and set up analytics tools. Sometimes, you must stitch together data from a laundry list of disparate sources — your product analytics tool, data warehouse, marketing campaigns, and more.

When you reach the final shape of your data, you may have more questions or find that the old questions are no longer relevant. The entire process heavily depends on the product manager’s data skills, delaying the output and making it less reliant.

This is where our PM Copilot comes in. It revamps how product managers interact with their data and helps them obtain the answers they seek.

Introducing PM Co-Pilot

The PM Copilot addresses these challenges by allowing PMs to interact with their data using natural language, eliminating the need for technical expertise. This AI agent, built on top of the Snowflake data warehouse and powered by LangChain, understands user queries in plain English and provides real-time insights with visualizations.

For example, in natural language, a PM can ask questions like, “Give me an overview of my add-to-cart and checkout page events.”

The agent then retrieves and processes data from Snowflake in real-time.

And tada! It gives you the visualized trends and conversion funnel for events from add-to-cart to checkout page events.

Technical architecture of the PM Co-pilot

To understand it simply, here is what is happening:

When a user asks a question in natural language, the agent breaks the statement down into steps using natural language processing.
The agent can comprehend as it breaks the prompt into atomic instructions, which helps the SQL agent understand it well and execute it correctly.
Once the instruction agent passes the necessary information to the DSL agent, as shown in the diagram below, it generates the parameters required for the SQL agent. These parameters include start date, end date, conversion window, granularity for aggregations, etc. In our case, the DSL was in the form of a JSON, which we translated to SQL to run to generate the output querying the Snowflake.
These parameters are fed into the SQL template, ready to generate a Funnel or a Trend analysis based on the user’s query.
A summarizer agent stitches together the outputs into a cohesive result.

The whole flow is broken down into multiple steps. Each step is a graph node, a tool, or an agent. Keep reading ahead for a detailed explanation of each step.

Agent deep-dive

Conversation Agent

🛠️ What: This agent is responsible for conversing with the user. If the user intends to run an analytics query or get answers from Snowflake, it smartly routes to the agentic workflow, where it can take action.

🧰 How: This routing is done using Langchain’s graph framework, Langgraph.

👍 Highlights: The conversation agent also has a short-term memory to remember the previous messages. This way, the copilot remembers what it did previously and has all the context for your follow-up questions.

Retrieval Agent

🛠️ What: After the Conversation, the Agent understands that the user wants to execute an analytics query; it feeds the prompt to the retrieval agent.

🧰 How: This agent hits the vector database and does a similarity search to find the events and dimensions the user requires for the analytical query.

Instruction agent

🛠️ What: The Instruction Agent splits the prompt into the required events, dimensions, and visualization types to get the correct results.

🧰 How: For example, a user query is — “Give me an overview of my conversion from cart to checkout.” This query will be split into three separate instructions by the instruction agent -

Trends for add to cart event
Trends for checkout page event
Funnel from add to cart to checkout page event

👍 Highlight: Despite the user asking for an overview of conversion, our RAG system accurately identifies the event names ‘add_to_cart’ and ‘checkout_page’ as the highest-matching events.

DSL agent

🛠️ What: We do not want our copilot to run random queries against the database; hence, a domain-specific language is created that is understood by our SQL agent. This step is optional.

🧰How: The DSL is a JSON config sharing the exact details about the query, such as event name, aggregation granularity (daily, weekly, etc.), conversion window, time period, and several other parameters.

👍 Highlight: In all of these parameters of the Pydantic template, a JSON with validations, which acts as the DSL, is used as an input for the Query Agent. Keep reading to learn more about the Query Agent.

Query agent

🛠️ What: This agent takes the DSL JSON as input and assigns a query type- funnel or trend in the above example. It then routes the DSL to a parameterized SQL functions.

🧰How: It checks the JSON parameters’ validity before hitting the database. Once the final query hits the Snowflake, the result is returned and displayed on the UI.

This orchestration layer is built using a combination of Langgraph and Langchain, which allows us to chain different AI agents into different flows.

By baking analysis flows as AI agents directly inside your data warehouse, in this case, Snowflake, you get the best of both worlds — the ease of natural language for expressing analysis requirements combined with the computational scale of your warehouse.

Let’s understand the advantages of building these agents directly on Snowflake.

Why build AI agents directly on Snowflake

For most companies, the central repository for product data is their cloud data warehouse, like Snowflake. Having AI capabilities natively integrated with Snowflake allows these agents to tap into your unified data sources without copying/replicating datasets across different systems.

Snowflake also provides APIs that allow the programmatic execution of SQL queries and the creation of data visualizations.

With AI agents deployed as functions inside Snowflake, you get the best of both worlds — agents can dynamically traverse your schemas/models to surface insights while having direct access to your cloud data warehouse’s computational power.

Building your AI agent on Snowflake

Here is an overview of the steps to create your AI agent. For a quick reference on how to build an agent on Snowflake, refer to our other blog here.

Set Up Your Development Environment:
Install necessary libraries. For example: LangChain, Snowflake connector, etc.
Configure your Snowflake account and database.
Connect to Snowflake
Implement LangChain for Query Handling:
Create a function to process natural language queries.
Use LangChain to decompose and execute queries.
Build the Natural Language Interface
Design a simple UI for users to input queries. You can use Streamlit to create and deploy your app.
Integrate the query handler to process and display results.

Note: In part 2 of this blog, we will supplement this information with code snippets with a detailed how-to that will help you replicate this agent at your end.

Making your agent better

To further enhance your AI agent, consider adding advanced features such as:

Automated Metadata Conversion

In the current version of the agent, we manually created JSON with event and dimension metadata extracted from Snowflake using SQL queries. For a more robust solution, set up a periodic job to automatically convert the event and dimension metadata into a Retrieval-Augmented Generation (RAG) readable JSON format.

Leveraging Snowflake’s New Features

Utilize Snowflake’s embed text (private preview) or native vector data type (private preview) to integrate the entire RAG system directly into your Snowflake instance. This can be an alternative to external services like Pinecone, which we used in our experiment.

Integrating Additional Data Sources

Expand the agent’s capabilities by connecting external applications as tools. For example, add Slack as a tool to send visualizations to specific channels or integrate Intercom to get summaries of all support queries.

Advanced Visualizations

Use libraries like Plotly or D3.js for more complex and interactive visualizations to enhance data representation and user experience.

Enhanced Analytical Features

Add functionalities like creating cohorts, stickiness charts, and other advanced analytics features to provide more comprehensive insights for product managers.

We envisage these agents to follow complex flow like, preempting the need to analyze a campaign’s impact on the funnel after a week of its launch. The agent should then be able to trigger downstream agents to automatically:

Find the user cohort that was part of the campaign.
Calculate funnel metrics for that cohort.
Compare funnel metrics against other cohorts.
Optionally trigger a marketing campaign for users who dropped off, using other downstream applications like Braze, Marketo, etc.

The future of warehouse-native AI agents

In the future, we plan to add learning capabilities so that end users and AI engineers can selectively upvote/downvote outputs to continually improve individual agents’ robustness.

Imagine a future where AI assistants live natively inside your data warehouse, capable of surfacing insights simply by conversing with them! Data teams can streamline analytics workflows, and product managers can get answers by asking questions. The possibilities of embodied AI agents that operate directly on your core, source-of-truth data sources are immense.

The PM Copilot demonstrates the power and potential of AI agents in transforming how we interact with data. By leveraging tools like LangChain and Snowflake, developers can create intuitive and powerful solutions that democratize data analytics. We encourage you to experiment with these tools and share your creations with the community and with us.

Other resources