Intro to Snowflake Copilot: FAQs from AMA

This week I hosted an AMA session with Pieter Verhoeven, who leads the product for Snowflake AI. In this session we showed a short demo of Copilot and presented best practices on how to use Copilot, Custom Instructions, and a sneak peek into what’s more to come.

What is Snowflake Copilot?

Snowflake Copilot is an LLM-powered SQL assistant that simplifies data analysis while maintaining robust data governance, and seamlessly integrates into your existing Snowflake workflow.

Snowflake Copilot is powered by a model fine-tuned by Snowflake that runs securely inside Snowflake Cortex, Snowflake’s intelligent, fully managed AI service. This approach means that your enterprise data and metadata always stay securely inside Snowflake. Snowflake Copilot also fully respects RBAC and provides suggestions based only on the datasets that you can access.

In case you missed it, here is a list of some of the great questions asked during the event.

Supported Use Cases

  • Explore your data. Ask open-ended questions about your data, and gain deeper insights — all without writing any SQL queries.
  • Generate SQL queries. Ask your data questions in plain English and Copilot will write SQL queries without you having to lift a finger.
  • Build complex queries. Refine suggested SQL queries conversationally by asking follow-up questions, allowing you to introduce more complexity.
  • Improve existing queries. Copilot can help explain and optimize your existing queries, resulting in cleaner, more efficient SQL queries.
  • Learn about Snowflake. Ask questions about Snowflake documentation and Copilot will help you find the relevant information and apply it to your dataset.
  • Customize responses. Customize how Copilot responds and give it access to more information by providing a set of custom instructions.

FAQs on Use-cases

Can Snowflake Copilot generate queries to answer questions about my data or also help me craft more performant queries?

Yes, Snowflake Copilot can both generate queries to answer questions about your data and help craft more performant queries. It can suggest optimizations for your existing SQL queries to improve efficiency.

Can Snowflake Copilot optimize my SQL? I am having some queries that take a while to run, but the Query History does not really show anything helpful.

Snowflake Copilot can recommend optimizations for SQL queries. If you have queries that are slow, you can ask Copilot for advice on making them more efficient.

How big are the datasets that it can read, process and analyze to generate the queries?

Snowflake Copilot leverages Universal Search to determine the most relevant tables and columns from each table when generating responses. This innovation allows Snowflake Copilot to deal with arbitrarily large datasets.

How does Snowflake Copilot develop an understanding of relations to different tables, schemas and databases?

Snowflake Copilot uses the names of your databases, schemas, tables, and columns, as well as the data types of your columns, to determine what data is available to query. It also leverages any comments you’ve added to tables and columns.

Can Snowflake Copilot answer business questions that require querying and transforming data within a database and/or across databases?

Snowflake Copilot can assist with querying and transforming data within a database schema. For cross-database queries, you would need to create and use views that join data from different schemas and databases.

Is a data dictionary and/or catalog a required or recommended prerequisite or are there other alternatives to refining the context?

A data dictionary or catalog is not explicitly required but having well-named and structured data will enhance Snowflake Copilot’s effectiveness. Using meaningful names for databases, schemas, tables, and columns, and adding comments to tables and columns helps in generating more accurate responses.

Is there a way to support multiple semantic definitions for users from differing business domains?

You can select a database and schema when interacting with Snowflake Copilot, based on which it will generate answers. You can leverage this feature to create specific schemas that contain data for different business domains.

FAQs on How it works

Can you explain what kind of SQL statement optimizations are done in the backend?

Snowflake Copilot suggests improvements based on common performance best practices, such as indexing, query restructuring, or more efficient use of SQL functions. Given that Copilot is powered by an LLM optimized for text-to-SQL, it uses learned patterns from large datasets to suggest these optimizations, making the process ML-driven.

How will Snowflake Copilot work with Iceberg Tables?

Since Iceberg is an open table format that Snowflake supports, as long as the tables are accessible within the Snowflake environment, Snowflake Copilot will be able to answer questions and generate queries for them.

FAQs on Privacy Concerns

Is any of my data or metadata ever sent to external servers?

No, Snowflake Copilot runs securely inside Snowflake Cortex, ensuring that your enterprise data and metadata always stay securely inside Snowflake.

What safeguards are in place to not disclose proprietary information?

Snowflake Copilot fully respects Role-Based Access Control (RBAC) and provides suggestions based only on the datasets that the user can access, ensuring that proprietary information is not disclosed inappropriately.

Are customer data or questions being used for future LLM training?

Snowflake’s Copilot’s current foundation models (find-tuned Code Llama and Mistral Large) were not trained using any Customer Data or Usage Data.

FAQs on Best practices

Who is Snowflake Copilot specifically designed for?

Snowflake Copilot is designed for users looking to simplify data analysis within Snowflake, including those with varying levels of SQL expertise.

Are there any specific data modeling approaches that work best with Snowflake Copilot (e.g. star schema vs. one big table)?

Creating curated views with descriptive, easy-to-understand names and appropriate data types can significantly improve the performance of Snowflake Copilot. Defining commonly used metrics and capturing common and complex joins in views are recommended practices.

Can Snowflake Copilot handle complex joins?

Yes, Snowflake Copilot can handle complex joins. To give you more control, you can define a set of curated views. By defining joins within views, you can simplify the process of querying combined data from multiple tables, allowing Snowflake Copilot to more effectively generate and optimize SQL queries.

How can I use Snowflake Copilot to generate specific SQL queries related to my own dataset?

To generate specific SQL queries using Snowflake Copilot, start by ensuring that your database and schema are correctly selected in your worksheet or notebook. You can then type your question in natural language into the Copilot message box. For example, if you need a query to find all sales over $500, you might type, “Show me all sales over $500.” Snowflake Copilot will then generate a SQL query based on your request. If specific values or filters are needed, be sure to include those details in your question.

Are cross-schema and cross-database joins supported?

Direct cross-schema and cross-database queries are not supported by Snowflake Copilot without additional setup. To facilitate this, users can create and use views that join data from different schemas and databases. This workaround allows Snowflake Copilot to handle queries that span multiple schemas and databases indirectly.

What are some examples of common prompts?

Common prompts you might use with Snowflake Copilot include:

  • “Tell me about this dataset.”
  • “How many transactions occurred last month?”
  • “List the top 10 customers by sales volume.”
  • “Explain how this SQL query works” followed by the SQL query.
  • “Can you help improve this query?” followed by a SQL query that you think can be optimized.

FAQs on Access & Availability

Can organizations opt-out of the Snowflake Copilot function? Can organizations roll out Snowflake Copilot based on role?

Yes, to disable Snowflake Copilot for your organization, contact Snowflake Support or your Snowflake account representative. You will also be able to enable/disable at the role level by leveraging the COPILOT_USER database role.

When can we expect Snowflake Copilot to be rolled out across all US regions in Azure?

We are working on enabling Snowflake Copilot in all customer regions. It is advisable to keep an eye on official Snowflake updates or contact Snowflake directly for the most accurate information.

Where can I find Copilot within the Snowflake product experience?

Snowflake Copilot can be accessed within Snowsight. You can find it by creating a new worksheet or notebook, or opening an existing one and then selecting “Ask Copilot” in the lower-right corner. This will open the Snowflake Copilot panel on the right side of the worksheet or notebook.

How can it be enabled for the accounts?

Snowflake Copilot is ready to use with no additional setup required beyond ensuring that your database and schema are selected in your worksheet or notebook.

What are immediate things that are on the roadmap?

To drive customer satisfaction, we are investing in the following areas

  • Improve current use cases and capabilities (e.g. text-to-SQL)
  • Support new use cases and capabilities (e.g. text-to-Python)
  • Expand to additional Snowsight surfaces (e.g. Marketplace Listings)

FAQs on Cost Consideration

How is Snowflake Copilot usage priced / charged?

Snowflake Copilot will be free until July 31, 2024. Details on pricing and billing that will take effect after this date are expected to be announced closer to that time.

Snowflake Developer Community

If you have more questions, or if you run into errors while building your project, feel free to reach out to the Snowflake developer community on reddit, stackoverflow, or comment on this post, I’d be happy to help.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. For data engineering best practices, and Python tips for beginners, follow me on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

--

--

Vino Duraisamy
Vino Duraisamy

Written by Vino Duraisamy

Developer Advocate @Snowflake❄️. Previously Data & Applied Machine Learning Engineer @Apple, Nike, NetApp | Spark, Snowflake, Hive, Python, SQL, AWS, Airflow

No responses yet