Building a CI/CD pipeline for Snowflake resource management using Github Actions + Terraform

Tatsuya Koreeda
Snowflake Engineering
6 min readApr 12, 2024

For Snowflake users, are you working on Terraforming Snowflake? Our company is currently Terraforming various resources in Snowflake.

Terraform is an open-source Infrastructure as Code (IaC) tool developed by HashiCorp. It is used for configuring, deploying, and managing infrastructure changes. There is a Terraform provider developed by volunteers specifically for Snowflake, allowing you to manage Snowflake using Terraform.
https://github.com/Snowflake-Labs/terraform-provider-snowflake

You can Terraform the following resources in Snowflake:
- Database
- Schema
- User
- Role
- Warehouse
- Table
- View

On the Terraform documentation, it is mentioned that you can manage other resources as well, so it would be good to check it out:
[Link to Snowflake Terraform Provider Documentation](https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest/docs)

Imagine a CI/CD pipeline in Snowflake

Additionally, for Snowflake Terraforming, official hands-on guides are available. By using them, you can set up authentication to Snowflake on your local PC and edit resources by running `terraform apply`.
[Link to Snowflake Terraforming Hands-On Guide](https://quickstarts.snowflake.com/guide/terraforming_snowflake/index.html?index=..%2F..index#0)

However, when it comes to Terraform, it is common to integrate it into CI/CD mechanisms for version control, team productivity improvement, security and auditing, and consistency in procedures, rather than local operations. Managing Snowflake resources manually can not only be time-consuming and laborious but also pose a risk of human errors.

In this article, we will explain how to build a CI/CD pipeline to automate resource management by leveraging Github Actions.

Local Resource Modification Testing

Firstly, it follows a simple flow of granting the user authentication keys and then executing Terraform.

Create an RSA key for authenticating to Snowflake:

$ cd ~/.ssh
$ openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_tf_snow_key.p8 -nocrypt
$ openssl rsa -in snowflake_tf_snow_key.p8 -pubout -out snowflake_tf_snow_key.pub

Create a user specifying the RSA_PUBLIC_KEY:

CREATE USER "tf-snow" RSA_PUBLIC_KEY='RSA_PUBLIC_KEY_HERE' DEFAULT_ROLE=PUBLIC MUST_CHANGE_PASSWORD=FALSE;
GRANT ROLE SYSADMIN TO USER "tf-snow";
GRANT ROLE SECURITYADMIN TO USER "tf-snow";

Add Snowflake account information to your local PC:

$ export SNOWFLAKE_USER="tf-snow"
$ export SNOWFLAKE_PRIVATE_KEY_PATH="~/.ssh/snowflake_tf_snow_key.p8"
$ export SNOWFLAKE_ACCOUNT="YOUR_ACCOUNT_LOCATOR"
$ export SNOWFLAKE_REGION="YOUR_REGION_HERE"

Describe the resources you want to add or modify in main.tf:

terraform {
required_providers {
snowflake = {
source = "Snowflake-Labs/snowflake"
version = "~> 0.68"
}
}
}
provider "snowflake" {
role = "SYSADMIN"
}
resource "snowflake_database" "db" {
name = "TF_DEMO"
}
resource "snowflake_warehouse" "warehouse" {
name = "TF_DEMO"
warehouse_size = "large"
auto_suspend = 60
}

Execute Terraform:

terraform init
terraform plan
terraform apply

With this, the local modification of Snowflake resources is complete. Please access the Snowflake UI to confirm that the DB TF_DEMO has been created.

The content of the hands-on session is directly useful for testing resource modifications locally. It seems common to first ensure that Snowflake resources can be modified locally before proceeding with resource modifications via Github Actions. Therefore, for those considering Snowflake Terraform management, mastering the hands-on session is essential.

Setting up Github Actions

Let’s proceed with setting up Github Actions. This time, we will configure a workflow named “Terraform Plan and Apply on Main Branch Push” to run when a push is made to the main branch.

Setting up Github Secrets

GitHub Secrets is a mechanism for securely storing and managing confidential information within a GitHub repository. To authenticate with Snowflake, we will environment variables for account information and password, and register them in Github Secrets.

Set the values for SNOWFLAKE_ACCOUNT and SNOWFLAKE_PASSWORD.

By the way, if you forgot what to set for SNOWFLAKE_ACCOUNT, you can retrieve it with the following SQL query:

SELECT current_account() as YOUR_ACCOUNT_LOCATOR;

Creating main.yml

I am creating a main.yml file in the .github/workflows folder hierarchy. Below is an explanation of the code.

name: Terraform Plan and Apply on Main Branch Push
on:
push:
branches:
- main
jobs:
terraform:
name: 'Terraform Plan'
runs-on: ubuntu-latest
defaults:
run:
working-directory: terraform/snowflake
env:
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
steps:
- name: Checkout Repository
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
with:
terraform_version: 1.0.11
- name: Terraform Initialize
run: terraform init
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
run: terraform plan
- name: Terraform Apply
run: terraform apply -auto-approve

With this setup, you should be able to make changes to Snowflake resources by opening a PR to the main branch and merging it.

What are Snowflake Resources Managed with Terraform?

In our company, we are managing the following Snowflake resources with Terraform (some are still in progress). I will list only the major ones.

- Databases
- Schemas
- Stages
- Roles
- Users
- Warehouses
- Snowpipes
- Streams

etc…

You might wonder why tables and views are not managed… Considering that business users may make changes to the data mart layer, we were concerned that introducing Terraform management could inhibit a data-driven culture. Additionally, as we are in the process of adopting dbt, we thought it would be better to delegate responsibilities there. Finding the optimal solution for managing Snowflake resources with Terraform in each company based on its development structure and use cases is necessary.

Terraform State Management for Snowflake Resources

When managing Snowflake with Terraform, you can also manage the state. Proper Terraform state management is crucial for maintaining resource consistency and enhancing team collaboration and should be appropriately configured.

Failure to manage Terraform state properly can lead to various issues and risks:

Inconsistency: If the state does not match the real-time infrastructure, Terraform may cause unexpected changes or destruction as it doesn’t know what is actually deployed.

Conflicts: When multiple developers or operators run Terraform simultaneously without proper state locking, overrides or conflicts in the state can occur.

Data Loss: If the local state file gets corrupted or lost, current infrastructure state information may be lost, making restoration difficult.

Security Risks: The state file may contain sensitive information (passwords, access keys, endpoints), and improper management can pose security risks.

Human Errors: Human errors such as manual state editing can lead to unexpected problems or infrastructure destruction.

Collaboration Issues: Without consistent and visible state, communication and collaboration among teams become challenging.

Decreased Efficiency: Improper state management may require manual checks and adjustments before applying changes, reducing deployment process efficiency.

In essence, improper Terraform state management can negatively impact infrastructure stability, security, and operational efficiency. Proper state management is particularly important for large projects or environments involving multiple individuals (though it may not be as critical for solo projects).

By the way, we have set up a remote state management environment using backend S3 and DynamoDB.

Distinguishing AWS Resources from Snowflake Resources

Up to this point, you should be able to make actual changes to Snowflake resources through Terraform apply triggered by GitHub Actions. However, some might also be managing cloud resources like AWS within the same repository using GitHub Actions + Terraform. For example, if you have created a pipeline from DMS to S3 to Snowflake, you may want to manage the DMS to S3 part as AWS resources with Terraform.

In such cases, you need to separate the GitHub Actions settings for AWS and Snowflake. In our company, we manage AWS and Snowflake directories separately within the same repository.

├── .github
| └── workflow
| └── aws.yml
| └── snowflake.yml
├── terraform
| └── aws
| └── snowflake

As shown above, we create aws.yml and snowflake.yml in .github/workflows and maintain the respective Terraform files in aws and snowflake directories under the terraform directory.

In Conclusion

How was it? I feel like the most challenging part is passing authentication from GitHub to Snowflake (in general, authentication can be a tricky task not only in this case). However, having the latest Snowflake resource configuration managed with Terraform is beneficial from both a personalization and maintenance perspective. I encourage your company to try building a CI/CD pipeline for managing Snowflake resources using GitHub Actions + Terraform!

--

--

Tatsuya Koreeda
Snowflake Engineering

CREATIVE SURVEY Inc. Data Engineer - Snowflake Japan WEST UG Leader & Snowflake Squad 2024 - Sharing insights on the use of Snowflake in life sciences🧬