Snowflake Unveils Their Own Notebook: 6 Game-Changing Features

Maeda Kentaro
Snowflake Engineering
5 min readJun 1, 2024

Conclusion

Use Snowflake Notebooks for data analysis on Snowflake

A single adorable polar bear studying with a notebook. Generated by DALL.E

Notebook Function Suddenly Appears on Snowflake

On May 25, 2024, a new function called Notebook was suddenly added to Snowflake, which became a hot topic in the data engineering community.

This article explains what a Notebook , why it is convenient, and what are the advantages of Snowflake Notebooks.

What is a Notebook ?

First of all, Notebook is not the name of a special function of Snowflake. It is a general term for interactive programming environments including Jupyter Notebook, Google Colab, and Hex.

The three common features of Notebooks are as follows.

Multi-language support: It supports multiple programming languages such as Python, SQL, and R.
Partial execution: You can select and execute code partially.
Various input/output formats: You can easily display images and tables. It also has various auxiliary functions for input/output, such as referring to variables created in other cells.

Figure 1: Executing code step-by-step in Jupyter Notebook
Figure 2: Displaying an image in Google Colab

Why Data Scientists Like Notebooks

Notebooks are especially popular among data scientists. This is because using Notebooks provides the following benefits.

1. Saves the trouble of switching languages: Data scientists sometimes use various programming languages for each step such as data acquisition, visualization, and training, but when using Notebooks, there is almost no need to switch language runtimes.

2. Accelerates the PDCA cycle: The partial re-execution function accelerates the data scientist’s PDCA cycle. In data analysis and AI model creation, it is necessary to try and error specific processes, but you don’t want to repeat time-consuming processes such as loading AI models into memory. By using the partial re-execution function of Notebooks, you can skip such time-consuming processes.

3. Makes visualization easier: With the various input/output functions of Notebooks, you can quickly check the results. Data scientists handle not only tabular data, but also images, videos, and audio. With Notebooks, you can easily check data in various formats.

Data Science on Snowflake Until Now

Traditionally, when performing simple data analysis on Snowflake, we used Snowsight’s worksheets, but compared to Notebooks, there were some difficult parts in multi-language support and checking input/output.
If you really wanted to use Notebooks, you needed to use Jupyter Notebook self-hosting or external SaaS, but due to security concerns when handling confidential data, it was sometimes prohibited within companies.

Snowflake Notebooks Make Data Analysis Easier

However, Snowflake Notebooks have solved such problems. Snowflake Notebooks are notebooks that run on Snowflake’s warehouse.
You can use SQL or Python for each cell. Of course, partial re-execution is also possible, and outputting images and tables is also possible.

Figure 3. Notebooks can be written in Python or SQL

With the advent of Snowflake Notebooks, anyone can now securely use the basic functions of Notebooks.

6 Convenient Features of Snowflake Notebooks

Being able to securely use the basic functions of Notebooks is a big advantage in itself, but that’s not all.
I will explain six points where Snowflake Notebooks excel compared to other Notebook products.

Passing data between cells

Data can be passed from SQL cells to Python cells and from Python cells to SQL cells.

Figure 5. Storing SQL execution results in cell2

The notation SQL as cell2 means storing the data retrieved using SQL in a notebook-level variable called cell2.
This variable can be referenced from other cells in two ways.
To reference this variable inside SQL, use the Jinja-style reference like {{cell2}}.

SELECT * FROM {{cell2}}

To reference this variable inside Python, use it like a normal Snowpark DataFrame.

my_df = cell2.to_pandas()

In either notation, the variable cell2 is internally replaced with the RESULT_SCAN notation that references the cache of past query results before execution.

SELECT * FROM TABLE(RESULT_SCAN('11111111–1111–1111–1111–111111111111'))

2. Support for Jinja2 notation

SQL cells support Jinja2 notation.

Let’s consider writing the following simple SQL:

SELECT 'Jan' AS month, SUM(amount) AS total_sales FROM sales_jan;
UNION ALL
SELECT 'Feb' AS month, SUM(amount) AS total_sales FROM sales_feb;
UNION ALL
SELECT 'Mar' AS month, SUM(amount) AS total_sales FROM sales_mar;
UNION ALL
SELECT 'Apr' AS month, SUM(amount) AS total_sales FROM sales_apr;
UNION ALL
SELECT 'May' AS month, SUM(amount) AS total_sales FROM sales_may;

This can be written by combining Python cells and SQL cells as follows, because Jinja’s for and if` can be used.

months = [
('Jan', 'jan'), ('Feb', 'feb'), ('Mar', 'mar'),
('Apr', 'apr'), ('May', 'may'), ('Jun', 'jun'),
('Jul', 'jul'), ('Aug', 'aug'), ('Sep', 'sep'),
('Oct', 'oct'), ('Nov', 'nov'), ('Dec', 'dec')
]
{% for month, abbrev in months %}
SELECT '{{ month }}' AS month, SUM(amount) AS total_sales FROM sales_{{ abbrev }}
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %};

3. Support for Streamlit component output

Snowflake Notebooks are integrated with Streamlit, allowing you to display any Streamlit component by describing it within a Python cell.

As an example, take a look at this dashboard visualizing the number of website visitors.
This is the output of a single cell in a Notebook.

Figure 6. Dashboard created in a Notebook visualizing the number of website visitors

4. Use Input via Streamlit Components

Input values can also be changed using Streamlit components
For example, using `st.multiselect` to select customer segments, changing these segment values will automatically re-execute downstream cells and update the graph.

Figure 7. Downstream cells are re-executed when input changes

5. Notebooks can be converted to tasks

An entire notebook can be deployed as a task. This is very useful for data transformation and machine learning model update tasks.

Figure 7. Dialog for converting a Notebook to a task

Let’s check how this task is deployed internally.
By entering and submitting this dialog, the following query is executed:

create or replace task "TestTask" 
warehouse=COMPUTE_WH
schedule='USING CRON 0 * * * * Asia/Tokyo'
as EXECUTE NOTEBOOK "DB"."PUBLIC"."TEST_NOTEBOOK" ( );

It seems that the Notebook is executed every hour on the hour by the EXECUTE NOTEBOOK command.
It looks like it can take arguments, so it will likely be possible to inject parameters from the outside in the future.
The official documentation also introduces examples of calling EXECUTE NOTEBOOK from external schedulers such as Airflow.

6. Notebooks can be managed with Git

Notebooks can be version controlled using the Git integration feature.

Figure 8. Git integration feature of Notebooks

Summary

Snowflake is evolving into a data platform that is easy to use even for data scientists.

--

--

Maeda Kentaro
Snowflake Engineering

RAKUDEJI inc. CEO | SnowPro Advanced : Data Engineer❄️