Snowflake: Integration with Google Cloud Storage

Naseer Ahmad
Contino Engineering
5 min readJul 27, 2022

--

In this blog post, we will learn how Snowflake storage integration can be used to read data from and write data to the Google Cloud Storage bucket.

Snowflake is one of the most popular cloud-agnostic Software-as-a-Service (SaaS) based cloud data warehouses that bring simplicity without compromising features. Snowflake is a multi-purpose cloud data warehouse that can be used for data lakes, operational data stores, data warehouses, and data marts. Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings. Its automatic vertical and horizontal scalability and decoupled Compute and Storage architecture help balance performance and operational cost.

Google Cloud Storage is the object storage service offered by Google Cloud. It’s designed for at least 99.999999999% annual durability or 11 nines, which means that even with one billion objects, you would likely go a hundred years without losing a single one. Moreover, it provides out-of-the-box features such as object versioning and fine-grain permissions that can make development hassle-free and help reduce operational costs.

Before deep-diving into the Snowflake integration with Google Cloud Storage, let’s understand the integration flow with GCP. Also, let’s familiarize ourselves with the terminologies used throughout the integration steps.

Integration Flow

Integration objects help connect external storage such as GCP Cloud Storage with Snowflake. GCP IAM web service can be used to provide fine-grained access to external storage, as shown in Figure 1. Also, many external stage objects can reference different buckets and paths and use the same storage integration for authentication.

Figure 1 — Snowflake Integration Flow with Google Cloud Storage

Terminologies Used

  • Google Cloud Storage: A storage container to store objects in the cloud
  • IAM: GCP Identity and Access Management (IAM) is a web-based service that controls access to GCP resources.
  • IAM Policy: Mechanism to define and enforce permissions on identities such as users, groups, and roles
  • Roles: Set of permissions that can be delegated to users
  • Users: An entity that can be used to interact with GCP
  • Storage Integration: Snowflake object stores generated IAM users and a set of allowed or blocked storage locations.
  • Snowflake External Stage: Used to store metadata about external data files such as Google Cloud Storage bucket URL and file format.

Integration Steps

  1. Create a Cloud Storage Bucket
  • Login into a GCP account → View all products
  • Select StorageCloud Storage
Figure 2 — GCS Cloud Storage
  • Click CREATE BUCKET
Figure 3 — Create Bucket
  • Enter Name your bucket → Click CREATE
Figure 4 — Specifying Bucket Properties

2. Create a Cloud Storage Integration in Snowflake

Cloud storage integration holds external storage metadata such as storage provider, role (generated through IAM service), and object storage (GCS Bucket) location. Creating this external storage means you don’t need to provide credentials when creating stages or loading data. For accessing Cloud Storage buckets, Snowflake creates a service account that can be granted permissions to access the bucket(s) that store your data files. Below is the code command to create an integration;

Figure 5 — Create GCS Storage Integration

3. Retrieve the Cloud Storage Service Account

After creating storage integration in Step 2, run the following command to retrieve the Cloud Storage service account for your Snowflake account.

Figure 6 — Snowflake Generated GCP Storage Account

4. Grant the Service Account Permissions to Access Bucket Objects

To read data from and write data in the Cloud Storage bucket, Let’s configure IAM access permissions for Snowflake in GCP by following the below steps;

  • Create a Custom IAM Role
    - Log into the GCP Console as a project editor → IAM & adminRoles → Create Role
Figure 7 — Create Role in GCP

- Enter the Title and Description of the role → ADD PERMISSIONS

Figure 8 — Specifying Role Properties

- Filter the permission list, and assign the following permissions to the role → ADD

Figure 9 — Add Permissions

- Click CREATE

Figure 10 — Create Role with Required Permissions
  • Assigning the Customer Role to the Cloud Storage Service Account
    - Go to Cloud Storage → Browser → Select a bucket to configure access → SHOW INFO PANEL in the upper-right corner
    - Add New principals → Role
Figure 11 — Assigning Custom Role to Bucket

5. Create an External Stage

Finally, we will create an external stage to hold the cloud storage integration metadata such as storage integration name, Cloud Storage bucket URL, and file format. However, before creating the external stage, we need to create a file format in Snowflake so that it can be referenced in the External Stage creation. Here’s the code snippet to create a file format in Snowflake;

Figure 12 — File Format Creation in Snowflake

To verify file format, run the below command;

Next, let’s create the External Stage by using the following code snippet;

Figure 13 — Stage Creation in Snowflake

Finally, we can run the following command to read the data from the GCS bucket, as shown below;

Figure 14 — List Objects from GCS Bucket

Briefly, this post outlined the steps to successfully integrate Snowflake with Google Cloud Storage (GCS) with all the required code snippets. I have also included screenshots of each step, which will help people understand and compare their outputs.

Learning never exhausts the mind — Leonardo da Vinci

--

--