Getting Started with Snowflake Table Schema Evolution

An Hands-On-Lab for beginners to configure and test schema evolution in Snowflake.

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

3 min readNov 28, 2023

Introduction

Ingestion and transformation pipelines are changing constantly and continuously. As you are adding data or your data providers enrich their offerings, the input file structure changes.
Pausing production data pipelines to alter objects is usually not an option as it may impact downstream consumers of the data.

In Snowflake, table structures can automatically adapt to new data and input formats, supporting actions like adding columns and removing NOT NULL constraints.

In this hands-on lab, we explore Table Schema Evolution using a banking dataset generated with FrostyGen, a configurable dummy data generator.

First, we’ll load a csv containing 1000 records with 7 columns (transaction_id, transaction_amount, transaction_date, category, status, sender_id, receiver_id).
Once the first batch of records is in our table, we’ll spice things up by loading a new file with 2 extra columns: risk_score and payment_node.

Let’s see how the table schema adapts!

Getting Started

This hands-on lab is designed for beginners.

Step 0: Pre-Reqs

Snowflake Account: if you don’t have one yet, sign up for a free trial Snowflake account. Schema Evolution is available in all Snowflake’s editions.
Script Import: copy and paste hol_schema_table_evolution.sql from the GitHub Repository into your Snowsight Worksheet.
Guided Steps: the sql script will guide you over the next steps.

Step 0: (Optional) Snowflake Trial Signup

Step 1: Setup

Run SQL commands to create the following objects:

Database: HOL_DB
Schema: HOL_SCHEMA_EVOLUTION
Stage: BANK_TRANSACTION_FILE_STAGE
Virtual Warehouse: HOL_WH
File Format: HOL_BANK_TRANSACTION_CSV_FORMAT

Step 2: Data Preparation

Download the CSV files (bank_transaction_1.csv and bank_transaction_2.csv) from the GitHub repository.
Upload the files to the staging area (BANK_TRANSACTION_FILE_STAGE) either via the PUT command or Snowsight UI.

Step 2: Upload CSV Files using Snowsight

Step 3: Initial Table Setup

Create the table BANK_TRANSACTION_STAGING using the INFER_SCHEMA option based on the structure of bank_transaction_1.csv .
Verify the table definition using the DESCRIBE command.
Load records from bank_transaction_1.csv using the COPY command.
Check loaded records using a SELECT statement.

Step 3–4: Table Creation and Schema Evolution

Step 4: Schema Evolution

Enable schema evolution on the existing table, BANK_TRANSACTION_STAGING.
Load records from bank_transaction_2.csv using the COPY command, which introduces two additional columns (risk_score and payment_node).
Re-run the DESCRIBE command to view the modified table structure.
Check new loaded records using a SELECT statement.

Conclusions

That’s it! You just witnessed a smooth table schema evolution in action!

Schema evolution operates independently, yet its synergy with complementary functions, such as INFER_SCHEMA, enhances its capabilities. When employed together, this collaborative approach ensures a dynamic interplay, offering improved adaptability and streamlined processes for managing evolving data structures.

As of my current writing, please note that Schema Evolution and INFER_SCHEMA (for JSon and CSV formats) are in Public Preview.
For additional details, refer to the official Snowflake documentation.

If you have any questions or suggestions, please feel free to reach out to me on LinkedIn.

Big thanks to Marianne Voogt (Sales Engineer @Snowflake), for reviewing this Hands-On Lab before publication!

Resources:

HOL GitHub Repository
Do you want to know more about Schema Detection and Schema Evolution? Read this article by Xin Huang: Data Loading Schematization with Snowflake