DevOps in Snowflake: How Git and Database Change Management enable a file-based object lifecycle

Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer (Snowflake).

Git has been a staple of software engineering for many years. It enables groups of developers to effectively collaborate and move back and forth between different versions of a software project. Databases share many similarities with software projects: A group of people are collaborating on them and their schemas evolve due to changing business needs. The evolution of databases is traditionally done using imperative ALTER commands. The problem with ALTERs is that they depend on the current state of the database. A set of ALTER commands used to migrate a schema from version A to C might not work if the schema had been changed to version B in the meantime. Similarly, ALTER commands cannot easily be inverted. Rolling back from version C to version A requires a separately crafted set of ALTER commands.

Wouldn’t it be nice if you could specify the target state for your database object and Snowflake would alter your object to fit this target state independently of the object’s current state? And wouldn’t it be even better if you could version the target state descriptions as they evolve over time? Wouldn’t it be nice if you could use industry-standard Git to re-apply older versions if needed?

Git, File Executions and Target State Definitions

Snowflake has recently introduced a number of exciting DevOps capabilities to the ever-growing feature set of Snowflake as a limited preview:

  • The Git integration allowing you to interact with files in your Git repositories right from within Snowflake
  • Database Change Management describes a set of features enabling you to follow the “configuration as code” pattern when setting up or updating your Snowflake objects. Declarative target state definitions bring the Snowflake objects in your dev/test/prod deployments to a consistent state without incurring any data loss in the process.

Together these DevOps features enable you to evolve the state of your databases in files versioned in a Git repository under your control without ever having to write an ALTER statement. Let’s work through a little example on how a file-based object lifecycle could look like.

Prerequisites

This article demonstrates several Snowflake features currently under preview. The EXECUTE IMMEDIATE FROM command is available to all accounts as part of a public preview. Both the git integration, the CREATE OR ALTER command and Jinja templating are currently only available to selected accounts for their respective previews. Contact your account manager if you don’t want to wait until their previews are extended to a global audience.

Connecting a Git Repository to Snowflake

The first step to a file-based object lifecycle with Git is to connect an externally hosted Git repository to your Snowflake account. This section will walk you through that process. Snowflake supports most common Git providers but for this example we will use a private repository hosted on GitHub. This repository will hold the SQL scripts we will use in the example.

An access token is needed to allow Snowflake to read the files in the private repository. We will use a fine-grained personal access token to adhere to the need-to-know principle.

You can generate a fine-grained personal access token from the developer settings found in the settings of your GitHub account. Make sure to limit the access token to the repository you want to use for your SQL scripts and to allow read-only access to the repository contents.

Make sure to grant access to the repository contents when creating the fine-grained access token on GitHub

Next, we have to deposit the newly generated token in our Snowflake account by creating a SECRET object. Make sure to replace the variables denoted by <>.

CREATE OR REPLACE SECRET git_secret
TYPE = password
USERNAME = '<github username>'
PASSWORD = '<personal access token>';

This SECRET can then be used to create an API INTEGRATION that allows Snowflake to use GitHub’s API. Note that by default only the ACCOUNTADMIN role can create new API INTEGRATIONS.

CREATE OR REPLACE API INTEGRATION git_api_integration
API_PROVIDER = git_https_api
API_ALLOWED_PREFIXES = ('https://github.com/<github username>')
ALLOWED_AUTHENTICATION_SECRETS = (git_secret)
ENABLED = TRUE;

Finally, the SECRET and the API INTEGRATION can be used to create a Git repository stage in your Snowflake account. The URL required for the ORIGIN parameter is the same URL you would use to clone the repository to your local machine.

CREATE OR REPLACE GIT REPOSITORY git_repo
API_INTEGRATION = git_api_integration
GIT_CREDENTIALS = git_secret
ORIGIN = 'https://github.com/<github username>/<repository name>';

To verify it worked you can list the contents in the repository at a given commit, branch or tag. Note that Snowflake’s Git integration is currently read-only. For now you cannot create new commits or branches from within Snowflake.

LIST @git_repo/commits/<commit hash>;
LIST @git_repo/branches/<branch name>;
LIST @git_repo/tags/<tag name>;

That’s it! We now have established a connection to our repository on GitHub from our Snowflake account. You can use this Git repository just like any other stage type for data ingestion. But most importantly you can craft your SQL scripts locally with your favorite tools and well-established Git workflows before running them in Snowflake. Let’s explore that aspect further!

Running templated SQL scripts from Git

In this section we will explore how the EXECUTE IMMEDIATE FROM command enables you to run templated SQL scripts right from the Git repository we just connected to our Snowflake account.

Let’s draft a set of SQL scripts that create a table and insert sample data. Thanks to the Git integration we can draft these scripts locally in our favorite IDE before committing them to Git, creating a pull request and getting our changes reviewed before finally merging them into the repository’s main branch.

-- Create file "create_customer_table.sql" and commit it to your git repo
-- The file should have the following contents:
#!jinja
CREATE TABLE customer (
id number not null,
first_name varchar,
last_name varchar
);
GRANT OWNERSHIP ON TABLE customer TO ROLE {{owner}};
EXECUTE IMMEDIATE FROM 'insert_customers.sql';
-- Create file "insert_customers.sql" and commit it to your git repo
-- The file should have the following contents:
INSERT INTO customer VALUES
(1, 'Jane', 'Doe'),
(2, 'Max', 'Mustermann'),
(3, 'Jean', 'Dupont')
;

The first script creates the new table while the second script inserts sample data into this table. In order to insert the sample data right after we are done with the table creation, we use EXECUTE IMMEDIATE FROM in the first script so that we can execute both scripts with just one command. The EXECUTE IMMEDIATE FROM command can be used to execute SQL scripts located on internal, external or in this case Git stages. In this example we are able to use a relative file reference as both scripts will be pushed to the same Git repository.

Due to business requirements the role owning the table might be different from the role used to create the table. However, when crafting the script we don’t yet know which role will own the table. To account for this we add a GRANT OWNERSHIP statement with a template variable in place of the role name. This allows us to choose the owning role when executing the script. As we want to use Jinja templating, the #!jinja directive needs to be added at the start of the file.

The next step after creating the two files is committing and pushing them to a remote branch of our repository on GitHub. Going forward I will assume that the files have been pushed to the main branch of the repository. Now that the remote repository on GitHub has been changed we need to fetch the new commit in Snowflake:

ALTER GIT REPOSITORY git_repo FETCH;

Finally, it is time to apply the scripts to our Snowflake environment by executing the create_customer_table.sql. We execute the insert_customer_table.sql script through EXECUTE IMMEDIATE FROM. Just like for the earlier LIST commands we can specify to use the file at a specific commit, branch or tag. We need to add the USING clause to pass a value for the owner template variable we used in our templated create_customer_table.sql file. Here we chose the SYSADMIN role to be granted ownership of the new table.

EXECUTE IMMEDIATE FROM 
'@git_repo/branches/main/create_customer_table.sql'
USING (owner => 'SYSADMIN');

Snowflake will now read and execute the two SQL scripts. Eventually, the execution will complete and the EXECUTE IMMEDIATE FROM will report the result of the last statement — in this case the INSERT — as its own result signaling a successful execution.

+-------------------------+                                                     
| number of rows inserted |
|-------------------------|
| 3 |
+-------------------------+

We have just seen how the EXECUTE IMMEDIATE FROM command enables the execution of individual SQL scripts right from a Git repository. This makes development best practices like versioning or reviews much more seamless. Note that executing a statement from a file requires the same privileges as executing the statement directly from a Snowsight worksheet.

Target State Evolution

Next we explore how the new CREATE OR ALTER command enables us to make changes to database objects without having to craft a stateful sequence of ALTER commands. All we need to do is specify a new target state and CREATE OR ALTER will apply all alterations that are necessary to reach this target state.

Let’s continue with our previous example.

Due to evolving business needs the existing columns of the customer table — id, first_name and last_name — are no longer sufficient. To address different legislation across countries we need to add a country_code column to the customer table. And while we are at it let’s also turn the id column into a primary key. For this we will create a new file containing a CREATE OR ALTER and commit it to our Git repository.

-- Create file "customer.sql" and commit it to your git repo
-- The file should have the following contents:
CREATE OR ALTER TABLE customer (
id number primary key,
first_name varchar,
last_name varchar,
country_code varchar
);

CREATE OR ALTER is especially powerful in combination with Git and EXECUTE IMMEDIATE FROM. Git allows us to record the evolution of the target state over time while EXECUTE IMMEDIATE FROM enables us to apply new target states or re-apply an older version should a rollback be required.

To apply the changes, we need to:

  1. Fetch the changes from our Git repository
  2. Execute customer.sql with EXECUTE IMMEDIATE FROM
ALTER GIT REPOSITORY git_repo FETCH;
EXECUTE IMMEDIATE FROM @git_repo/branches/main/customer.sql;

The CREATE OR ALTER looks very similar to CREATE OR REPLACE. Both commands would bring the table to the target state. But there is a key difference: With CREATE OR ALTER we do not lose any data:

SELECT * FROM customer;
+----+------------+------------+--------------+
| ID | FIRST_NAME | LAST_NAME | COUNTRY_CODE |
|----+------------+------------+--------------|
| 1 | Jane | Doe | NULL |
| 2 | Max | Mustermann | NULL |
| 3 | Jean | Dupont | NULL |
+----+------------+------------+--------------+

Evolving your object definitions is as easy as that. No ALTER commands needed, no data lost! The entire life-cycle of the customer table can now be managed from this file. Simply update the file, commit and fetch the changes and apply the updates to your Snowflake.

Please note that the preview for the CREATE OR ALTER command currently only entails the TABLE and TASK objects. For now, ALTER commands still have to be used for the evolution of other object types.

Conclusion and next steps

We have seen how Git repositories can be integrated with Snowflake and used to version target state definitions for Snowflake objects. Furthermore, we have seen how the EXECUTE IMMEDIATE FROM command can be used to apply the files containing these target state definitions to Snowflake.

Going forward the features demonstrated here will empower you to securely evolve your Snowflake databases in a declarative manner. You will never again have to go through the process of manually crafting an error-prone and stateful upgrade script with a bunch of ALTER commands.

Ready to supercharge your database management with Git, templated file executions and target states? Here are some actions you can take now:

  1. Experiment: Try the file-based object lifecycle in Snowflake using the steps outlined in this article.
  2. Learn More: Dive deeper into the new EXECUTE IMMEDIATE FROM feature by exploring the official documentation
  3. Join the Previews: Reach out to your account manager to get access to the Git integration, Jinja templating and CREATE OR ALTER
  4. Share your Insights: If you’ve implemented these practices in your work, we’d love to hear about your experiences. Share your thoughts, tips, or questions in the comments below, on our community forum or contact me on LinkedIn.
  5. Stay Informed: Don’t miss out on future articles and updates. Follow our Medium publications for more tech insights.

Thanks for reading and have fun integrating these features into your workflow!

--

--