Snowflake’s CREATE TABLE…USING TEMPLATE?

DALL-E: “Cartoon polar bears in the style of Takashi Murakami”

To stay up-to-date with the latest offerings from Snowflake, I make it a point to review its documentation regularly. I recently discovered a newer feature that caught my attention — the ability to use templates with the CREATE TABLE command. In this article, I will delve into a familiar concept with a new twist and provide some examples of how it can be beneficial. Let’s explore how to use “CREATE TABLE…USING TEMPLATE” with a quick example.

If you want to skip the wordiness, you can find all the code used in this demo at the end of the article under "All The Code!"

What does USING TEMPLATE mean?

The documentation states the following on the subject:

Creates a table with the column definitions derived from a set of staged files.

The TEMPLATE keyword expects to be followed by a query using the INFER_SCHEMA function in combination with staged files in the following formats: Apache Parquet, Apache Avro, ORC, JSON, and CSV.

As of the time of this writing, support for JSON and CSV files is currently in preview.

INFER_SCHEMA is a pretty neat table function, but I find that it’s not quite as common to find in the wild. It’s hard to believe it’s been over two years since this was introduced.

Using an expression that calls this function, the clause can detect and apply the column definitions found in the files to create tables.

Making It Work

For this example, I wanted to create a quick reproducible example entirely inside Snowflake. You can skip this portion if you already have a parquet file you would like to use in a stage.

I’ll elaborate on what this is doing in another article (I’ll add the link below once I release it), but for now, we’re making a CSV based on the classic Iris dataset with scikit-learn and loading it to a new internal stage. You can try this out with Snowflake’s Python Worksheets.

Python Worksheet code to create a parquet file and stage it.
Snowflake Python Worksheet options for enabling the former code to run.

Now that we have a parquet file in an internal stage let’s compare and contrast options for creating a table from this file.

Before TEMPLATE

I’ll walk through one way we could do it with a file format.

AS to CREATE TABLE.

I’m using WHERE 1 = 0 here to avoid writing any rows. My intention is only to create the “shell” of the table.

After TEMPLATE

Using the TEMPLATE keyword, we can create a table based directly on the file and avoid specifying the column names.

USING TEMPLATE to CREATE TABLE.

So… When should this be used?

Incorporating this feature can expedite the development process by simplifying the task of creating tables for data engineers directly from files in stages.

This solution may not apply to every scenario, and manual intervention may still be necessary, particularly concerning data types and column names. Nonetheless, this feature is a convenient shortcut worth considering.

There may be alternative methods for achieving this task. One possible approach would be to utilize a stored procedure and leverage other libraries to make inferences from the files. Some of these libraries may be already integrated into the development process.

Thank you for reading, and I hope you learned something new!

All The Code!

import io

from sklearn.datasets import load_iris
from snowflake.snowpark import Session

def main(session: Session):
session.sql("CREATE OR REPLACE STAGE IRIS_PARQUET;").collect()
iris_df = load_iris(as_frame=True).data
session.file.put_stream(
io.BytesIO(iris_df.to_parquet(index=False)),
"@IRIS_PARQUET/iris.parquet"

return "We just loaded the file into the stage."
CREATE OR REPLACE FILE FORMAT IRIS_PARQUET_FORMAT
TYPE = 'PARQUET';

-- The "old" way.
CREATE OR REPLACE TABLE IRIS
AS
SELECT $1:"sepal length (cm)" AS "sepal length (cm)",
$1:"sepal width (cm)" AS "sepal width (cm)",
$1:"petal length (cm)" AS "petal length (cm)",
$1:"petal width (cm)" AS "petal width (cm)"
FROM @IRIS_PARQUET
(FILE_FORMAT => IRIS_PARQUET_FORMAT)
WHERE 1 = 0;

-- The "new" way.
CREATE OR REPLACE TABLE IRIS
USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) WITHIN GROUP (ORDER BY ORDER_ID)
FROM TABLE
(
INFER_SCHEMA(
LOCATION=>'@iris_parquet',
FILE_FORMAT=>'iris_parquet_format'
)
)
);

--

--

Tyler White
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

I constantly seek new ways to learn, improve, and share my knowledge and experiences. Solutions Architect @ Snowflake.