How to Put the Azure Data Factory (ADF) Snowflake Connector to Use

Mohini Kalamkar
Hashmap, an NTT DATA Company
7 min readDec 3, 2020

Enterprises have their data stored in various locations, from in-house databases to SaaS platforms. To get insights into this data, you’d extract and load the data from various sources into a data warehouse or data lake. To connect or build what can sometimes be fairly complex ETL/ELT data pipelines, enterprises prefer to use modern tools like Matillion, Azure Data Factory, Fivetran, Talend Stitch, and many more. Depending on your architecture and data requirements, you might choose one or multiple ETL/ELT tools for your use case.

Recently, I had the opportunity to explore both Matillion and Azure Data Factory (ADF) to load data to Snowflake’s Data Cloud. Most of this post will be focused on my experience with ADF, but I’ll touch on Matillion briefly as well.

Before we get into the data integration tooling discussion, if you are considering a cloud data platform in any of the 3 major public clouds, I’d recommend checking out Snowflake. Snowflake continues to impress with an impactful and cost-performant service that checks all the key boxes for our team at Hashmap, and it’s a SQL first solution that you can get up and running with very quickly.

Quick Thoughts on Matillion

I found that Matillion has an added level of maturity compared to ADF with respect to integrating sources with Snowflake. In particular:

  • Great UI
  • Alter Warehouse to scale up and down on a job-by-job basis
  • Create File Format component (creates a named file format that can be used for bulk loading data into and unloading data out of Snowflake tables)

The Microsoft ADF team is still working to add some of these features to the ADF connector for Snowflake.

The ADF Snowflake Connector

Let’s jump into the ADF Snowflake connector. I’ll cover several topics as listed below:

  1. Using the ADF Snowflake connector — Implementation of SCD type 1.
  2. ADF Snowflake connector capabilities.
  3. ADF Snowflake connector limitations (as of this writing).

Azure Data Factory (ADF) is Azure’s cloud service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation.

ADF has a range of built-in connectors that allow integration with various data sources and targets, and in June 2020, the Microsoft ADF engineering team added support for Snowflake as a target data store.

A quick note that I’ve also had the need to integrate and connect Azure Functions with Snowflake, but I’ll save that discussion for another post.

To start exploring the ADF Snowflake connector, let’s see how SCD type 1 can be implemented with ADF connector!

Getting Started

For this implementation the following Azure services are used: Azure Blob storage, Azure Data Factory (Linked Services, Datasets and Data flows).

Prerequisite :

  • ADF linked services for Azure blob storage and your Snowflake account are created.
  • A Planes table is created in Snowflake with the below schema:
CREATE TABLE Planes (
ICAO nvarchar(200),
IATA nvarchar(200),
MAKER nvarchar(200),
DESCRIPTION nvarchar(200)
);
  1. First, let's upload a CSV file to Azure blob.
CSV file
CSV file

2. In ADF, create a (source) dataset for the blob file.

Dataset for blob
Dataset for blob file

3. Create a (sink) dataset for Snowflake.

Dataset for Snowflake

4. Now, let’s create an ADF Dataflow with 3 components. Source, AlterRow, and Sink.

As shown below, select the source dataset created in Step 1.

Dataflow source setting

5. Dataflow source setting continued:

Dataflow source setting

6. Now let’s preview source data.

The screenshot below is showing the data preview for Source.

Dataflow source data preview

7. Now after the source is ready, let’s add Alter transformation. Alter transformation allows for inserts/updates/upserts, and deletes records as per given conditions.

I have specified the condition for Upsert.

Dataflow AlterRow

8. Dataflow Sink setting.

As shown below, select the linked service created for Snowflake.

Dataflow Sink

9. Dataflow sink setting.

Select the Snowflake schema, table name, and update method.

Select the primary key of the Snowflake table in ‘Key Columns’. Based on ‘Key Columns’ records will be inserted/upserted/deleted/updated in the sink table.

It’s good practice to check ‘Data Preview’ before moving to the next task in Dataflow.

Dataflow Sink

10. Import dataflow to Pipeline and click on debug.

Import dataflow to pipeline

11. Now, let’s verify Planes tables in Snowflake.

All records are inserted into the Snowflake table.

Snowflake table — Planes

12. Next, let's update the CSV file.

Updated A318 and inserted B713.

Updated CSV file

13. Debug the pipeline and check the Planes table in Snowflake.

As the screenshot below shows, you have an updated record for A318.

Updated record in Planes table

14. New record B713 is inserted into the Planes table.

Planes table — new record inserted from CSV table

As shown in steps 13 and 14, updated and new records are available in the Snowflake table.

So without writing any code, it’s pretty easy and straightforward to achieve SCD type 1 using the ADF Snowflake connector.

ADF Snowflake Connector Capabilities

When implementing the ADF Connector for Snowflake know that you will have support for the following types of activities:

For more details, please refer to this document — https://docs.microsoft.com/en-us/azure/data-factory/connector-snowflake

So, the connector has some good capabilities, however, there are a few limitations to be aware of as well. Let’s look into that.

ADF Snowflake Connector Limitations

  1. While creating a linked service for the Snowflake account, SSO integration (authenticator external browser) is not yet supported.
  2. ADF Snowflake Connector doesn’t support executing a Snowflake stored procedure. However, an alternative or workaround to this is calling stored procedures using Azure Functions.
  3. A Snowflake table must be available before loading the data.
  4. Direct copy from/to Snowflake is only supported for the following formats: Parquet, delimited text, and JSON (Snowflake table or query result only has a single column and the data type of this column is VARIANT, OBJECT, or ARRAY). When your source/sink data store or format is not natively compatible with the Snowflake COPY command, then you need to enable the built-in staged copy using an interim Azure Blob storage instance which does add extra cost to Azure billing.

Final Thoughts

The ADF Snowflake Connector is making strides in making it easier to connect native Microsoft tools to Snowflake and implement SCD type 1. As with every data integration tool, it's always recommended to design, architect, and deploy a final solution approach based on the use case/s, pattern/s, SLAs, skillsets, and data consumption requirements that your individual situation dictates.

Ready to Accelerate Your Digital Transformation?

At Hashmap, we work with our clients to build better, together.

If you are considering moving data and analytics products and applications to the cloud or if you would like help and guidance and a few best practices in delivering higher value outcomes in your existing cloud program, then please contact us.

Hashmap offers a range of enablement workshops and assessment services, cloud modernization and migration services, and consulting service packages as part of our Cloud (and Snowflake) service offerings. We would be glad to work through your specific requirements.

Other Tools and Content You Might Like

Feel free to share on other channels and be sure and keep up with all new content from Hashmap here. To listen in on a casual conversation about all things data engineering and the cloud, check out Hashmap’s podcast Hashmap on Tap as well on Spotify, Apple, Google, and other popular streaming apps.

Mohini Kalamkar is a Cloud and Data Engineering Consultant with Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

--

--