#Tech in 5 — Snowflake Object Management in a CI/CD Pipeline

Managing Snowflake Objects using Python and Jinja in the CI/CD Pipeline of your Choice!

Published in

Hashmap, an NTT DATA Company

8 min readDec 18, 2019

It’s beginning to look a lot like Snowflake!

The holiday season is upon us, and if you are from the northern hemisphere, this means one thing: Snow! I loved skiing at this time of year. Unfortunately, Texas doesn’t get mass snow dumps, so I need to get my fix from somewhere else this year!

While I wait for the freak snowstorm of the next millennium, let’s talk about the hottest cloud data warehouse of 2019, Snowflake.

Brief Technical Overview

Snowflake is the leading cloud-native data warehouse providing accelerated business outcomes with unparalleled scaling, processing, and data storage all packaged together in a consumption-based model. Hashmap already has many stories about Snowflake and associated best practices — here are a few links that some of my colleagues have written.

Here’s Your Day 1 and 2 Checklist for Snowflake Adoption

by Venkatesh Sekar

medium.com

Securely Using Snowflake’s Python Connector within an Azure Function

Why and How to use Key Vault

medium.com

5 Things I Wish I Knew Before Learning Snowflake

Hashmap has been focusing on delivering the simplest, most cost-efficient data solutions for the better part of a…

medium.com

What I want to discuss today in greater detail is how to build an Object Manager for Snowflake that can be used as a part of any CI/CD pipeline. As of this writing, there is currently no way to programmatically manage Snowflake Objects across environments (think Dev, Test, and Prod) today.

Many customers who approach us have asked:

"There has got to be a better way to manage Snowflake objects across environments!"

Personally, I couldn’t agree more. A data engineer shouldn’t have to write, rewrite, and run SQL Statements manually across environments. While the worksheet feature is great and turns data warehousing into a Databricks-esque experience, it is a waste of data engineering time. (business folks read $$$). Manual creation of objects is so 1995 — most businesses cannot afford the downtime, nor the expense required to manage this process manually.

Hence the evolution of CI/CD as a process to handle the mundane. This cycle of development encourages building code once and then letting the pipeline can control all of the tedious tasks that formerly had to be managed by people. Testing, Building, Deploying. In this case, we would want to remove the tedious task of rewriting the SQL scripts to manage Snowflake across several environments and then shove the resulting software into a CI/CD pipeline that can be managed from a manifest file.

Deeper Dive — How to Create an Object Manager + CI/CD Pipeline for Automation!

So how would you do this? Here is a list of the steps we need to take your tedious, slow object management into a fully functioning pipeline:

Create templates for your SQL statements
Propagate templates based on a manifest file
Run the SQL generated through a cursor object
Verify Objects were created correctly across environments

There are many ways you could complete these four tasks. In today’s example, I will be using Jinja, a python package that builds templates. A yaml file acts as our manifest file, and python reads the SQL and executes through the native Python — Snowflake Connector. Finally, I will be confirming that the role was created in Snowflake once it runs.

Step 1: Jinja — Template SQL — a swiss-army knife for CI/CD warrior

Jinja is a python package used to create templates. This is extremely powerful as we can not only package standard templates, but use our own templates to generate SQL files. In the template below, I am creating the compute warehouse, creating a role in an environment, and assigning the role to the SYSADMIN level. Then I am creating the database and schema in Snowflake. Next, I am granting the User Role (I just created usage on) to all of the objects I just created and all future objects generated within the schema.

from jinja2 import Templatesnowsql_replicate = Template (    """
    USE ROLE SYSADMIN;
    USE WAREHOUSE {{ENV}}_{{LANDING_WAREHOUSE}};
    CREATE DATABASE {{ENV}}_{{LANDING_DATABASE}};
    
    USE ROLE SYSADMIN;
    GRANT USAGE ON DATABASE {{ENV}}_{{LANDING_DATABASE}} TO ROLE {{ENV}}_{{LANDING_ROLE}};
    
    USE ROLE SYSADMIN;
    USE DATABASE {{ENV}}_{{LANDING_DATABASE}};
    USE WAREHOUSE {{ENV}}_{{LANDING_WAREHOUSE}};
    CREATE SCHEMA {{LANDING_SCHEMA}};
    CREATE SCHEMA {{LANDING_SCHEMA}}_AR;
    CREATE SCHEMA {{LANDING_SCHEMA}}_ERR;
    SHOW SCHEMAS;
    
    USE DATABASE {{ENV}}_{{LANDING_DATABASE}};
    GRANT USAGE ON SCHEMA PUBLIC TO ROLE {{ENV}}_{{LANDING_ROLE}};
    GRANT ALL ON SCHEMA PUBLIC TO ROLE {{ENV}}_{{LANDING_ROLE}};
    
    GRANT ALL ON SCHEMA {{LANDING_SCHEMA}} TO ROLE {{ENV}}_{{LANDING_ROLE}};
    GRANT ALL ON SCHEMA {{LANDING_SCHEMA}}_AR TO ROLE {{ENV}}_{{LANDING_OLE}};
    
    GRANT USAGE ON DATABASE {{ENV}}_{{SF_DW_DATABASE}} TO ROLE {{ENV}}_{{SF_DW_ROLE}};
    
    USE DATABASE {{ENV}}_{{SF_DW_DATABASE}};
    GRANT ALL ON SCHEMA PUBLIC TO ROLE {{ENV}}_{{SF_DW_ROLE}};
    
    GRANT ALL ON SCHEMA {{SF_DW_SCHEMA}} TO ROLE {{ENV}}_{{SF_DW_ROLE}};
    GRANT ALL ON SCHEMA {{SF_DW_SCHEMA}}_AR TO ROLE {{ENV}}_{{SF_DW_ROLE}};
    GRANT ALL ON SCHEMA {{SF_DW_SCHEMA}}_ERR TO ROLE {{ENV}}_{{SF_DW_ROLE}};
    
    USE ROLE SYSADMIN;
    USE DATABASE {{ENV}}_{{LANDING_DATABASE}};
    SHOW GRANTS ON SCHEMA PUBLIC;
    SHOW GRANTS ON SCHEMA {{LANDING_SCHEMA}};
    SHOW GRANTS ON SCHEMA {{LANDING_SCHEMA}}_AR; 
    SHOW GRANTS ON SCHEMA {{LANDING_SCHEMA}}_ERR;
    """

As you can see, all variables here are defined by the {{object_name}} model. This is important as this is how the user defines what can and can’t be injected into the Jinja template via the manifest file.

Step 2: Yaml — Easier than JSON for Manifest Files

I chose to use a yaml file as my manifest file as I find it easier to parse than JSON. I know I will get some disagreement from the community on whether or not this is true. This is just my personal preference. Now with that out of the way, this yaml file contains all of the information for populating the Jinja Templates across multiple environments.

Step 3: Render unto Caesar…your SQL

Once you’ve configured the manifest file, the next step in the pipeline is to render the SQL files. I created an Artificer class to render the templates using python. This python class creates the SQL files based on the Jinja Template and the manifest using the render template function. This allows us to reuse this class in the CI/CD pipeline to render as many files as we want. In this example, I am showing how to render one file; however, you could dynamically render based on the templates chosen/specified in a subfolder.

import yamlfrom abominable import snowsql_replicate_templateclass Sf_Artificer:    def __init__(self):        self.config_path = "mypath"
        self.output_path = "another file path"        with open(self.config_path) as y:
            self.configs = yaml.safe_load(y)
    def render(self, **kwargs):        """
        :param kwargs: takes an enviornment.
        :return: a rendered version of the template.
        """        if kwargs['ENV'] in self.configs:
            info_dict = self.configs[kwargs['ENV']]  # This grabs the dictionary associated with the env.
        else:
            raise BaseException("The ENV specified is not found in the yaml file")         self.output_replicate = snowsql_replicate_template.snowsql_replicate.render(ENV=kwargs['ENV'],
                                    LANDING_WAREHOUSE=info_dict['LANDING_WAREHOUSE'],
                                    LANDING_DATABASE=info_dict['LANDING_DATABASE'],
                                    LANDING_ROLE=info_dict['LANDING_ROLE'],
                                    LANDING_SCHEMA=info_dict['LANDING_SCHEMA'],
                                    SF_DW_DATABASE=info_dict['SF_DW_DATABASE'],
                                    SF_DW_ROLE=info_dict['SF_DW_ROLE'],
                                    SF_DW_SCHEMA=info_dict['SF_DW_SCHEMA'])
    def generate_snowsql_files(self, **kwargs):
        with open(self.output_path + "/" + "replicate_template" + "_" + (kwargs['ENV']).lower() + ".sql", 'w') as p:
            p.write(self.output_replicate)
        p.close()

Execute the SnowSQL!

After the files are created, they are ready to run against any Snowflake environment. Now that we have those files, I am sure you are wondering: Why wouldn’t you use python to run the SQL through the native python connector? This is precisely what we will do! I created an example Inquisitor Class that allows the SQL to run through the connector. This should be simple enough? I created a Class that reads the Jinja generated SQL file and runs the statements using an iterator.

from abominable import connectionclass SF_Inquisitor:    # No one expects the Spanish Inquisition!     def __init__(self, **kwargs):        # These are for logging purposes.
        self.result = []
        self.verify = []  # keeps track of logs from
        self.snow = []        # Sets up connection to SF once! 
        self.connect = connection.UserConnect()
        self.cursor = self.connect.snow_connect()        # Need to pass a sql file that the program can read and parse the sql commands out into a list.
        # Then pass into th self.snow_sql or make it apart of the argument. This will be passed in the main.py    def cardinal_snow(self, **kwargs):        """
        'We are the ones who will be asking the questions!' - Cardinal Biggles
        method takes a list of snowsql statements and creates a role given the information provided to kwargs. 
        this means that it can take in any user info and given the role, will call a snowflake
        connection and execute the list of snowsql statements. 
        
        params: kwargs, takes in given kwargs from the stream and assigns appropirate variables
        returns: snowflake execution result. 
        """        with open(kwargs['file']) as snowsql:               # Reads and cleans up a bunch of the files, cleans out bad data.
            snow_list = snowsql.readlines()
            cleaned_snow = list(map(lambda s: s.strip(), snow_list))
            snow_full = list(filter(None, cleaned_snow))
            snowsql.close()        print(snow_full)        self.snow += [self.cursor.execute(sql) for sql in snow_full]

NB: I encountered a few hiccups when trying to run multi-line Jinja generated SQL Statements. The Snowflake Python Connector as of writing this does not accept multi-line SQL statements. Therefore, there is some clean up required on the developer end to format this so the connector can understand:

When reading the SQL files in python, I had to strip newline characters and empty items in the list of SQL Statements. You can see the code that handles this in the with open statement.
Once formatted into a list, I used an iterator to run the list of SQL statements sequentially (note that using multi-threading I’d likely experience an increase in performance). This is handled by the final line of the method in Inquisitor.

Once run, you can check your updated Snowflake environment, and the changes will be reflected once the cursor has executed the SQL statements. The last step here is to package up what we have developed and push it into your favorite CI/CD pipeline of your choice. After deploying in a pipeline, you can have the commit of the manifest file/changes to the manifest file be the trigger to start your pipeline and manage the objects based on the Jinja Templates you have defined.

Conclusion

In this ‘Tech in 5’ we covered how to manage Snowflake objects using Python and Jinja. The output could be packaged up and managed/deployed in a CI/CD pipeline of your choice!

The only downside here is that for each object/role in the database you would have to generate your own Jinja-Python. However, I’m sure you can see the potential in truly automating your Snowflake Objects by placing this pattern in your favorite CI/CD pipeline.

Someday, I would like to build an Open Source tool that could make Snowflake object management easy for data engineers everywhere…who knows maybe it’s around the corner.

At this stage, I am just starting to build this, so feedback is more than welcome and Remember to Always Be Coding! I will update with the Github page soon!

Feel free to share on other channels and be sure and keep up with all new content from Hashmap here.

Kieran Healey is a Cloud and Data Engineer with Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

#Tech in 5 — Snowflake Object Management in a CI/CD Pipeline

Managing Snowflake Objects using Python and Jinja in the CI/CD Pipeline of your Choice!

Brief Technical Overview

Here’s Your Day 1 and 2 Checklist for Snowflake Adoption

by Venkatesh Sekar

Securely Using Snowflake’s Python Connector within an Azure Function

Why and How to use Key Vault

5 Things I Wish I Knew Before Learning Snowflake

Hashmap has been focusing on delivering the simplest, most cost-efficient data solutions for the better part of a…

Deeper Dive — How to Create an Object Manager + CI/CD Pipeline for Automation!

Step 1: Jinja — Template SQL — a swiss-army knife for CI/CD warrior

Step 2: Yaml — Easier than JSON for Manifest Files

Step 3: Render unto Caesar…your SQL

Execute the SnowSQL!

Conclusion

Written by Kieran Healey