Snowflake New DataOps Related Features (2024)

How to manage Snowflake resources with Snowflake’s Git Integration, EXECUTE IMMEDIATE FROM, Snowflake CLI, and CREATE OR ALTER TABLE

Photo by Everyday basics on Unsplash

In this post, I will provide background information on DevOps and explore the potential benefits of certain existing and upcoming Snowflake features, which, while not yet publicly available, promise to contribute significantly to our DevOps practices.

Background — What is DevOps?

DevOps is a set of practices to shorten the development life cycle, deliver software quickly, and continuously improve the software development and delivery processes.
When managing data infrastructure, there are instances where we use the term “DataOps,” which is often interchangeably referred to as “Data DevOps.”

The main principles and practices of DevOps include the following:

  1. Collaboration: Improve collaboration between development and operations teams.

2. Automation: Implementing automation for various aspects of the software development life cycle, including building, testing, deployment, and infrastructure provisioning.

3. Continuous Integration (CI): Integrating code changes frequently into a shared repository, where automated builds and tests are triggered.

4. Continuous Deployment (CD): Automating the deployment process to push code changes to production or staging environments as soon as they pass the necessary tests.

5. Infrastructure as Code (IaC): Managing and provisioning infrastructure through code (like Terraform). My previous blog post regarding utilizing Terraform to create Snowflake objects can be read.

6. Monitoring and Feedback: Implementing monitoring tools to gather insights into the performance and health of applications and infrastructure.

7. Version Control: Using version control systems like Git to manage and track changes to the source code, enabling collaboration and maintaining a history of changes.

Data Change Management (DCM) refers to the set of processes, practices, and policies employed to manage and control changes made to data within an organization’s information systems. It involves systematically handling modifications, updates, additions, or deletions of data to ensure data integrity, consistency, and accuracy throughout its lifecycle.

Imperative vs. Declarative

Imperative is about specifying the exact steps to achieve a goal.
For example:

-- Version 1.00 date 2 Feb
-- Create Sales table with two columns: product_name and sales_amount
CREATE TABLE sales (
product_name VARCHAR(20),
sales_amount DECIMAL(10, 2)
);

-- Version 1.01 date 3 Feb
-- Alter the length of the "product_name" column to 100 characters
ALTER TABLE sales
MODIFY COLUMN product_description VARCHAR(100);

-- Version 1.02 date 4 Feb
-- Alter the column name from "product_name" to "name"
ALTER TABLE sales
MODIFY COLUMN product_name RENAME TO name;
  • Declarative is about describing the desired outcome, letting the tool figure out how to achieve it.
-- Most updated version only
-- Create Sales table with two columns: product_name and sales_amount
CREATE TABLE sales (
name VARCHAR(100),
sales_amount DECIMAL(10, 2)
);

The imperative code can quickly become challenging to comprehend and manage. It introduces potential errors, as it’s not always clear which step you’re currently at or from which step you should execute the latest changes. In contrast, the declarative approach provides a more streamlined experience. You view the most up-to-date code execute it, and the tool identifies differences and determines the commands to reach the desired state. This enhances clarity and minimizes the risk of unintended mistakes in the deployment process.

To delve deeper into the integration of DevOps principles with Snowflake using Terraform, refer to my additional article:
https://medium.com/infostrux-solutions/utilizing-terraform-to-create-snowflake-resources-by-code-dac4d232c70d

Snowflake's new dataOps related features

Git Integration

The Git integration empowers you to seamlessly interact with files in your Git repositories directly from within the Snowflake environment.

Currently it’s in a PRIVATE PREVIEW and supporting ‘READ ONLY’ without WRITE functionality.

CREATE OR REPLACE secret EYLON_GITHUB_SECRET 
TYPE = password
username = 'github_user_name'
password = 'github_access_token';

CREATE OR REPLACE api integration EYLON_GITHUB_API_INTEGRATION
api_provider = git_https_api
api_allowed_prefixes = ('https://github.com/your_username')
allowed_authentication_secrets=(EYLON_GITHUB_SECRET)
enabled=TRUE;

CREATE OR REPLACE git repository EYLON_REPOSITORY
api_integration= EYLON_GITHUB_API_INTEGRATION
git_credentials = EYLON_GITHUB_SECRET
origin = 'https://github.com/your_username/your_repository';

GRANT READ ON GIT REPOSITORY EYLON_REPOSITORY TO ROLE DEV_ROLE;
USE ROLE DEV_ROLE;

-- Use git commands
LIST @EYLON_REPOSITORY/branches/main;
SHOW GIT BRANCHES IN EYLON_REPOSITORY;
ALTER GIT REPOSITORY EYLON_REPOSITORY FETCH;

I find this feature incredibly valuable, especially when there’s a requirement to share code among data engineers. For instance, I may have written scripts that I want data analysts to execute, but I prefer not to burden them with the complexities of installing Git and VS Code on their machines. This option allows them to stay within the Snowflake environment, simplifying the process and enabling them to accomplish more without additional software overhead.

EXECUTE IMMEDIATE FROM

EXECUTE IMMEDIATE FROM executes the SQL statements specified in a file in a stage. The file can contain SQL statements or Snowflake Scripting blocks. The statements must be syntactically correct SQL statements.

Code Syntax

CREATE OR REPLACE TABLE my_inventory(
sku VARCHAR,
price NUMBER
);

EXECUTE IMMEDIATE FROM './insert-inventory.sql';

SELECT sku, price
FROM my_inventory
ORDER BY price DESC;

In conjunction with Git integration, you now have the capability to retrieve code files directly from Git and execute them. While alternative methods may exist for more comprehensive CI/CD workflows, this feature proves particularly useful for training, testing, rapid PoCs, or small projects. The streamlined flow entirely controlled within Snowflake eliminates the necessity for external services like GitHub Actions and running scripts from an external host.

Read more in Snowflake documentation here

Snowflake CLI

SnowSQL is the command line client for connecting to Snowflake to execute SQL queries and perform all DDL and DML operations, including loading data into, querying, deleting, and unloading data from database tables.

This tool has been available for an extended period, yet it seems underutilized in the realm of DevOps, particularly for creating and monitoring Snowflake’s resources.

# Setup the connection according Snowflake documentation

# Connect to Snwoflake
snowsql -a <account name> -u <user name>

# USE DB
use <db name>;

# USE SCHEMA
use SCHEMA <schema name>;

# QUERY
SELECT * FROM <table name> LIMIT 3;

The full list of commands:

user#> !help

+------------+-------------------------------------------+-------------+--------------------------------------------------------------------------------------------+
| Command | Use | Aliases | Description |
|------------+-------------------------------------------+-------------+--------------------------------------------------------------------------------------------|
| !abort | !abort <query id> | | Abort a query |
| !connect | !connect <connection_name> | | Create a new connection |
| !define | !define <variable>=<value> | | Define a variable as the given value |
| !edit | !edit <query> | | Opens up a text editor. Useful for writing longer queries. Defaults to last query |
| !exit | !exit | !disconnect | Drop the current connection |
| !help | !help | !helps, !h | Show the client help. |
| !options | !options | !opts | Show all options and their values |
| !pause | !pause | | Pauses running queries. |
| !print | !print <message> | | Print given text |
| !queries | !queries help, <filter>=<value>, <filter> | | Lists queries matching the specified filters. Write <!queries> help for a list of filters. |
| !quit | !quit | !q | Drop all connections and quit SnowSQL |
| !rehash | !rehash | | Refresh autocompletion |
| !result | !result <query id> | | See the result of a query |
| !set | !set <option>=<value> | | Set an option to the given value |
| !source | !source <filename>, <url> | !load | Execute given sql file |
| !spool | !spool <filename>, off | | Turn on or off writing results to file |
| !system | !system <system command> | | Run a system command in the shell |
| !variables | !variables | !vars | Show all variables and their values |
+------------+-------------------------------------------+-------------+--------------------------------------------------------------------------------------------+

Read more in Snowflake documentation here

CREATE OR ALTER TABLE

Create a new table in the current/specified schema or alter an existing one.
This command proves highly beneficial in scenarios involving slowly evolving schemas, especially when uncertainty exists regarding the current state of a table. The advantage lies in avoiding the need to recreate the entire table or specify the exact ALTER that needs to take place. Snowflake is figuring out if and what the changes that are required are.

CREATE OR REPLACE TABLE my_inventory(
sku VARCHAR
);

CREATE OR ALTER TABLE my_inventory(
sku VARCHAR,
price NUMBER
);

Currently still in preview

This is an excellent illustration of Snowflake’s enhancement to its existing API, facilitating a more declarative approach to table creation. Prior to this command, options were limited to either dropping and recreating the table or employing imperative commands for altering the existing table. With this improvement, Snowflake intelligently identifies necessary changes to the table and executes them seamlessly.

Summary

In this blog post, we explored the basics of DataOps and examined some of the new Snowflake’s features that can help you manage your Snowflake infrastructure. Including Git Integration, EXECUTE IMMEDIATE FROM, Snowflake CLI, and CREATE OR ALTER TABLE

To stay updated on more Snowflake-related posts, follow me at my Medium profile: Eylon’s Snowflake Articles.

I’m Eylon Steiner, Engineering Manager for Infostrux Solutions. You can follow me on LinkedIn.

Learn more

Subscribe to Infostrux Medium Blog at https://blog.infostrux.com for the most interesting Data Engineering and Snowflake news. Follow Infostrux’s open-source efforts through GitHub.

--

--