Getting Started with Snowflake Table Schema Evolution
An Hands-On-Lab for beginners to configure and test schema evolution in Snowflake.
Introduction
Ingestion and transformation pipelines are changing constantly and continuously. As you are adding data or your data providers enrich their offerings, the input file structure changes.
Pausing production data pipelines to alter objects is usually not an option as it may impact downstream consumers of the data.
In Snowflake, table structures can automatically adapt to new data and input formats, supporting actions like adding columns and removing NOT NULL constraints.
In this hands-on lab, we explore Table Schema Evolution using a banking dataset generated with FrostyGen, a configurable dummy data generator.
- First, we’ll load a csv containing 1000 records with 7 columns (
transaction_id
,transaction_amount
,transaction_date
,category
,status
,sender_id
,receiver_id
). - Once the first batch of records is in our table, we’ll spice things up by loading a new file with 2 extra columns:
risk_score
andpayment_node
.
Let’s see how the table schema adapts!
Getting Started
This hands-on lab is designed for beginners.
Step 0: Pre-Reqs
- Snowflake Account: if you don’t have one yet, sign up for a free trial Snowflake account. Schema Evolution is available in all Snowflake’s editions.
- Script Import: copy and paste
hol_schema_table_evolution.sql
from the GitHub Repository into your Snowsight Worksheet. - Guided Steps: the sql script will guide you over the next steps.
Step 1: Setup
Run SQL commands to create the following objects:
- Database: HOL_DB
- Schema: HOL_SCHEMA_EVOLUTION
- Stage: BANK_TRANSACTION_FILE_STAGE
- Virtual Warehouse: HOL_WH
- File Format: HOL_BANK_TRANSACTION_CSV_FORMAT
Step 2: Data Preparation
- Download the CSV files (
bank_transaction_1.csv
andbank_transaction_2.csv
) from the GitHub repository. - Upload the files to the staging area (
BANK_TRANSACTION_FILE_STAGE
) either via thePUT
command or Snowsight UI.
Step 3: Initial Table Setup
- Create the table
BANK_TRANSACTION_STAGING
using theINFER_SCHEMA
option based on the structure ofbank_transaction_1.csv
. - Verify the table definition using the
DESCRIBE
command. - Load records from
bank_transaction_1.csv
using the COPY command. - Check loaded records using a
SELECT
statement.
Step 4: Schema Evolution
- Enable schema evolution on the existing table,
BANK_TRANSACTION_STAGING
. - Load records from
bank_transaction_2.csv
using theCOPY
command, which introduces two additional columns (risk_score
andpayment_node
). - Re-run the
DESCRIBE
command to view the modified table structure. - Check new loaded records using a
SELECT
statement.
Conclusions
That’s it! You just witnessed a smooth table schema evolution in action!
Schema evolution operates independently, yet its synergy with complementary functions, such as INFER_SCHEMA
, enhances its capabilities. When employed together, this collaborative approach ensures a dynamic interplay, offering improved adaptability and streamlined processes for managing evolving data structures.
As of my current writing, please note that Schema Evolution and INFER_SCHEMA (for JSon and CSV formats) are in Public Preview.
For additional details, refer to the official Snowflake documentation.
If you have any questions or suggestions, please feel free to reach out to me on LinkedIn.
Big thanks to Marianne Voogt (Sales Engineer @Snowflake), for reviewing this Hands-On Lab before publication!
Resources:
- HOL GitHub Repository
- Do you want to know more about Schema Detection and Schema Evolution? Read this article by Xin Huang: Data Loading Schematization with Snowflake