How to Setup a CI/CD Pipeline for Snowflake using Sqitch and Jenkins

Published in

Hashmap, an NTT DATA Company

7 min readJun 24, 2019

Change management with databases is painful, so should you continue without it? Can you work without it? Of course you can, but that will become even more painful, especially if there’s a team working on the same DB and there is no version management and source control. Things will get very messy, very quick.

Ideally, databases should always be under source control, and the process to deploy the change to a database should be consistent, testable, and reproducible with a single, authoritative source for the schema. Version control also makes deploying the database to various stages (dev, test, prod) much easier. There are various ways to achieve this and various tools to help you achieve that.

While reviewing options and alternatives for a tool to setup a CI/CD pipeline for Snowflake (we are getting this request from our Snowflake customers on a regular basis), Sqitch hit my radar — this post discusses my experience with setting up the pipeline with GitHub as source control, Jenkins as the build server, and a Sqitch docker image as the agent.

So What Are the CI/CD Pipeline Components?

Sqitch is a database change management application that currently supports Snowflake’s Cloud Data Warehouse plus a range of other databases including PostgreSQL 8.4+, SQLite 3.7.11+, MySQL 5.0+, Oracle 10g+, Firebird 2.0+, Vertica 6.0+, Exasol 6.0+.

Sqitchers just rolled out v1.0.0 and they have a very promising roadmap for the future.

Here are links to other key components of the architecture:

Jenkins is an open source automation tool written in Java with plugins built for CI/CD purposes.
Snowflake is the only data warehouse built natively for the cloud for all your data and all your users providing instant elasticity, per second pricing, and secure data sharing with multi-region and multi-cloud options. Also, here’s a link to some other recent Snowflake stories from Hashmap.

It is highly recommended that you go through this tutorial provided by the Sqitch community which explains how to use Sqitch to manage database change on Snowflake.

Prerequisites

To follow this post, you will need a Jenkins Server, which can schedule executors via Docker (for a quick test you can just pull a docker image of Jenkins).

The executor used in this demonstration is a Docker container with Sqitch, the Snowflake ODBC driver, and the SnowSQL client. You can use the hashmapinc/sqitch:jenkins docker image or build your own from the docker-sqitch repository. You will also need a Git client and a Snowflake account — if you are new to Snowflake, you can get a 30 day free trial of Snowflake here.

Getting Started

You will need to create a Snowflake user with enough permissions to execute the tasks that we are going to deploy through Pipeline.
Login to your Snowflake account
Go to Accounts -> Users -> Create

Give the user sufficient permissions to execute the required tasks.
For more info on User Management in Snowflake, Refer to Snowflake’s User Management and Access Control Documentation.

Creating the pipeline

Store the credentials: Since the Jenkinsfile is part of the code, we do not hard code the credentials. We will store it in the Jenkins credentials Manager. Go to Jenkins -> Credentials

Jenkins

Add the webhook: If you are using Jenkins and you want your builds to auto trigger on push, go ahead and add a webhook in your GitHub repo for your Jenkins server.
Create a repo for your Sqitch SQL scripts. Add the repo to your Jenkins installation. Use the BlueOcean plugin as it makes adding new repositories very convenient.

Jenkinsfile

We are going to use Jenkinsfile to define our pipeline as code. Let’s take a look at the Jenkinsfile

Now, let’s take a look at what’s happening here…

pipeline {
    options {
      timeout(time: 1, unit: 'HOURS') 
  }
  agent {
    docker {
      image 'hashmapinc/sqitch:jenkins'
      args "-u root -v /var/run/docker.sock:/var/run/docker.sock --entrypoint=''"
    }
  }

We are providing a timeout option of one hour. We are also specifying the Docker container that will act as the agent for running the job along with the runtime arguments for the container. The sqitch base Docker image has an ENTRYPOINT of sqitch --help so we override it with entrypoint=’’ . This keeps the container from terminating.

stages {
    stage('Moving .snowsql to workspace and replacing snowsql in /bin') {
        steps {
            sh '''
            rm /bin/snowsql 
            mv /var/snowsql /bin/
            mv /var/.snowsql ./
            ''' 
        }
    }

Now, In the first stage, we’ll move the .snowsql directory from var to our workspace and also replace /bin/snowsql with /var/snowsql.

Sqitch requires a connection string to connect to Snowflake, and this connection string should contain:

username variable $USERNAMEpassword variable $PASSWORDSnowflake Account address=hashmap.snowflakecomputing.comDriver name Driver=SnowflakeDatabase name=fliprand the warehouse warehouse=sqitch_wh

Now, for the second stage of our pipeline, we will deploy the changes to our target warehouse, and we’ll be using the credentials from the Jenkins credential store.

stage('Deploy changes') {
      steps {
        withCredentials(bindings: [usernamePassword(credentialsId: 'snowflake_creds', usernameVariable: 'USERNAME', passwordVariable: 'PASSWORD')]) {
          sh '''
              sqitch deploy "db:snowflake://$USERNAME:$PASSWORD@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh"
              '''           
        }
      }
    }

In the last stage, we verify our changes.

stage('Verify changes') {
      steps {
        withCredentials(bindings: [usernamePassword(credentialsId: 'snowflake_creds', usernameVariable: 'USERNAME', passwordVariable: 'PASSWORD')]) {
          sh '''
              sqitch verify "db:snowflake://$USERNAME:$PASSWORD@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh"
              ''' 
        }
      }

After our pipeline executes, successfully or otherwise, we need to reset permissions to 777 to the workspace directory, so Jenkins can clean up before the next job.

post {
    always {
      sh 'chmod -R 777 .'
    }
  }

Go ahead and add the Jenkinsfile in your GitHub repo. Commit your changes and push to GitHub.

For information on how to add changes via sqitch, visit Introduction to Sqitch on Snowflake. It’s cleaner to keep one object per file while writing Sqitch SQL scripts. Once you push the changes, the webhook added in GitHub will send a POST message to Jenkins and the pipeline will be triggered.

Our Result

Upon successful completion of the pipeline, you should see an output like this:

c[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Deploy changes)
[Pipeline] withCredentials
[Pipeline] {
[Pipeline] sh
[owflake-sqitch-ci-cd_master-4KRIUCFJ5X7PGMBERRN6PYWQF2S5EEPCMF6ULWY3K4N5SP2RPD5A] Running shell script
+ sqitch deploy db:snowflake://****:****@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh
Adding registry tables to db:snowflake://****:@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh
Deploying changes to db:snowflake://****:@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh
+ appschema .. ok
+ users ...... ok
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Verify changes)
[Pipeline] withCredentials
[Pipeline] {
[Pipeline] sh
[owflake-sqitch-ci-cd_master-4KRIUCFJ5X7PGMBERRN6PYWQF2S5EEPCMF6ULWY3K4N5SP2RPD5A] Running shell script
+ sqitch verify db:snowflake://****:****@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh
Verifying db:snowflake://****:@hashmap.snowflakecomputing.com/flipr?Driver=Snowflake;warehouse=sqitch_wh
* appschema .. ok
* users ...... ok
Verify successful
[Pipeline] }
[Pipeline] // withCredentials

As you can see, both the deployment and verification were successful. For reverting your changes, you can either go to the Jenkinsfile and change the deploy command to revert or you can add a new step in your Sqitch plan to drop the change and you can continue without having to edit the pipeline.

Deploying to Production

Once you are sure of the changes and want to deploy to production, just change the connection string to target your production database and you are good to go.

stage('Deploy changes to Production') {
steps {
    withCredentials(bindings: [usernamePassword(credentialsId: 'snowflake_creds', usernameVariable: 'USERNAME', passwordVariable: 'PASSWORD')]) {
    sh '''
        sqitch deploy "db:snowflake://$USERNAME:$PASSWORD@hashmap.snowflakecomputing.com/flipr_prod?Driver=Snowflake;warehouse=sqitch_wh"
        '''
    }
    }
}

TL;DR

Sqitch makes it convenient to deploy changes to Snowflake. When implemented in CI/CD workflow, it automates the entire process thereby allowing you full reproducibility.

CI/CD for Snowflake can be achieved using Sqitch with the Sqitch Docker image for Snowflake as the execution environment. The same principle can be used for the same results using the AWS code pipeline or Azure DevOps instead of Jenkins. You might need to make some changes in the image according to your environment.

With Sqitch being under active development, we can expect more features and an even better experience in the future.

If you enjoyed this, here are some other recent Hashmap stories that you might like as well:

A Cloud Savvy Way to Store Oil & Gas Drilling Data

Setup a Well Log Passive Store for Max Efficiency & Cost Effectiveness

medium.com

This is Why You Should Use Snowflake for Security Analytics

I will admit it, when asked what you should use as the core technology to understand your organizations cybersecurity…

medium.com

How to Ingest & Enrich IoT Data at Scale into Snowflake with Apache NiFi

No, You Don’t Need To Work Long Hours to Make This Happen

medium.com

Business Won’t Wait — Migrating to Azure for Data & Analytics

Accelerating Business Outcomes Using Azure — Databricks, Data Factory, PowerBI, & Snowflake Cloud Data Warehouse

medium.com

Feel free to share on other channels and be sure and keep up with all new content from Hashmap.

Prem Dubey is a Site Reliability Engineer (connect with him on LinkedIn) at Hashmap working across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers, partners, and the community.

How to Setup a CI/CD Pipeline for Snowflake using Sqitch and Jenkins

So What Are the CI/CD Pipeline Components?

Prerequisites

Getting Started

Creating the pipeline

Jenkinsfile

Our Result

Deploying to Production

TL;DR

A Cloud Savvy Way to Store Oil & Gas Drilling Data

Setup a Well Log Passive Store for Max Efficiency & Cost Effectiveness

This is Why You Should Use Snowflake for Security Analytics

I will admit it, when asked what you should use as the core technology to understand your organizations cybersecurity…

How to Ingest & Enrich IoT Data at Scale into Snowflake with Apache NiFi

No, You Don’t Need To Work Long Hours to Make This Happen

Business Won’t Wait — Migrating to Azure for Data & Analytics

Accelerating Business Outcomes Using Azure — Databricks, Data Factory, PowerBI, & Snowflake Cloud Data Warehouse

Written by Prem Dubey