Snowflake: Integration with Azure

Naseer Ahmad
Contino Engineering
5 min readAug 8, 2022

--

Suppose your company uses the Microsoft Azure blob for data storage and wants to utilize Snowflake to read data from and write data to the Microsoft Azure blob storage. This blog post guides you on how to make this happen. Before we jump into this, let’s understand what Snowflake and Azure Blob storage are.

What is Snowflake?

Snowflake is one of the most popular cloud-agnostic Software-as-a-Service (SaaS) based cloud data warehouses that bring simplicity without compromising features. Snowflake is a multi-purpose cloud data warehouse that can be used for data lakes, operational data stores, data warehouses, and data marts. Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings. In addition, its automatic up and down scalability and decoupled Compute and Storage architecture help balance performance and operational cost. Snowflake is a member of the AWS Partner Network (APN) and is available in AWS Marketplace.

What is Azure Blob Storage?

Azure Blob (Binary Large Object) is Microsoft’s object cloud storage service to store an enormous amount of unstructured data, such as images, text files, videos, and so on, at a massive scale and later can be accessed with optimal performance. In addition, the blob is underline storage for Azure data lake analytics solutions and managed disk subsystems for Azure virtual machines. To get more information on Azure blob storage, please visit Microsoft Documentation.

Before deep-diving into the Snowflake integration with Azure blob storage, let’s familiarize ourselves with the terminologies used throughout the integration steps.

Terminologies Used

  • Storage account: Contains all of the azure Storage data objects such as blobs, file shares, tables, queues, and disks.
  • Containers: Used to organize a set of blobs, just like a directory in the file systems.
  • Identity and Access Management (IAM): Use to add roles and permissions on Storage Accounts.
  • Tenant ID: It’s your Global Unique Identifier (GUID) for your Microsoft Azure account.
  • Cloud Storage Integration: Snowflake object stores generated IAM users and a set of allowed or blocked storage locations.
  • Snowflake External Stage: Use to store metadata about external data files such as S3 bucket URL and file format.

Since we are familiarized with terminologies, let’s create blob storage which will later be used to read the data from Snowflake.

Create Storage Account Steps

  1. Login into the Azure Portal → Storage accounts
Figure 1 — Select Storage accounts

2. Select Create under Storage account

Figure 2 — Select Create under Storage Accounts

3. On Create a storage account Under

  • Project details → Enter Subscription → Create Resource Group
  • Instance details → Enter Storage account name → Region
  • Leave the rest of the options as is → Review + create → Create
Figure 3 — Create a Storage Account

4. Click Containers

Figure 4 — Select Containers

5. Click on + Container → Enter Name → Leave Public access level as default → Create

Figure 5 — Create a Container

6. Click on the newly created Container → Upload → File → Browser the file → Click Upload

After creating the storage account and storage container to upload the files to azure blob storage, let’s go through the azure blob storage integration steps with Snowflake;

Integration Steps

  1. Create a Cloud Storage Integration

Storage integration stores external storage metadata such as storage provider, azure tenant ID, and Azure blob storage location. Cloud provider administrators in your organization grant permissions on the storage locations to the generated service principal. This option prevents users from supplying credentials when creating stages or loading data. Below is the code snippet to create an integration;

CREATE STORAGE INTEGRATION <integration_name>
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = AZURE
ENABLED = TRUE
AZURE_TENANT_ID = ‘<tenant_id>’
STORAGE_ALLOWED_LOCATIONS =(‘azure://<account>.blob.core.windows.net/’)

Note: To find the tenant_id, Login into Azure PortalAzure Active DirectoryProperties → Copy Tenant ID

Note: To find the storage location URL → Storage accounts → Select Storage account NameContainers → Select Container Name → Properties

Figure 6 — Create Azure Storage Integration

2. Grant Snowflake Access to the Storage Location

To get the azure consent URL, let’s run the below code snippet;

DESC INTEGRATION <Integration_name>;
Figure 7 — Snowflake Generated Azure Account Properties
  • Copy AZURE_CONSENT_URL in a browser
  • Click Accept button
Figure 8 — Azure Consent Acceptance

Note: The Microsoft permissions request page redirects to the Snowflake corporate site (snowflake.com).

  • Navigate to the Azure ServicesStorage Accounts → Click the name of the storage created in Create Storage Account step
  • Click Access Control(IAM)
Figure 9 — Storage accounts Access Control (IAM)
  • Click Add role assignment → Search and Select Storage Blob Data Contributor role → Add AZURE_MULTI_TENANT_APP_NAME as a member→ Review + assign
Figure 10 — Add role Assignment

Note: This is the identity in the AZURE_MULTI_TENANT_APP_NAME property in the DESC INTEGRATION <Integration_name>; output. Search for the string before the underscore in the AZURE_MULTI_TENANT_APP_NAME property.

3. Create an External Stage

In this last step, we will create an external stage that references the storage integration created in step 1. Before preceding the external stage creation, let’s create a file format in Snowflake to be used as a reference in the External Stage creation process. Here’s the code snippet to create a file format in Snowflake;

CREATE OR REPLACE FILE format my_csv_format
TYPE = CSV
FIELD_DELIMITER = '|'
SKIP_HEADER = 1
NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = TRUE
COMPRESSION = gzip;
Figure 11 — File Format Creation in Snowflake

To verify file format, run the below command;

DESC FILE FORMAT <file_format_name>;

After creating a file format, let’s continue with the external stage creation using the below code snippet;

CREATE STAGE azure_blob_int
STORAGE_INTEGRATION = <Integration name>
URL = <BLOB Storage URL>
FILE_FORMAT = <File format name>;
Figure 12 — Stage Creation in Snowflake

Now, let’s run the following command to read the data from azure blob storage using Snowflake;

LIST <Stage_name>;
Figure 13 — List Object(s) from Azure Blob Storage

In summary, this post walked you through Azure’s Storage account creation process and Snowflake integration steps to read data from Azure Blob storage with all the required code snippets. Also, the screenshot of each step will help you understand and compare your outputs along the way.

Learning never exhausts the mind — Leonardo da Vinci

--

--