Snowflake Azure Storage Integration

How I learned to stop generating SAS tokens for every single external stage and learned to love Storage Integration!

Kieran Healey
Hashmap, an NTT DATA Company
4 min readJan 30, 2020

--

Secure safely without SAS tokens

We have all been guilty of generating SAS tokens and using them to access cloud resources. I am especially guilty of this for my personal projects when I just want things to “work”. However, these sorts of things would never fly in a production environment. Can you imagine a team of 20 developers? They each want access to a different resource within a blob container. If one of them wanted to create an External Stage, each developer would have to generate a SAS token, and then you have problems. Who generated what? What if someone saves the SAS token locally? What if, god forbid, it gets uploaded to a public-facing Github repo?

Here at Hashmap, we work with clients every day who not only need better practices to handle their development but also on how to control their security. When we see an anti-pattern develop, we try to correct and educate on better practices. Data Engineering is about learning together.

Snowflake Storage Integration — Why?

So how does the Storage Integration fix our issue? In a nutshell, a Storage Integration is a configurable object that lives inside Snowflake. Once configured, you can use it to create an External Stage without having to input the SAS token every time. Instead, you just call the Storage Integration and you have all of your secrets locked away behind the Storage Integration. Here is how you would create a storage integration inside of Snowflake:

create storage integration azure_int
type = external_stage
storage_provider = azure
enabled = true
azure_tenant_id = '<tenant_id>'
storage_allowed_locations = ('azure://myaccount.blob.core.windows.net/mycontainer/path1/', 'azure://myaccount.blob.core.windows.net/mycontainer/path2/')
storage_blocked_locations = ('azure://myaccount.blob.core.windows.net/mycontainer/path2/');

Here you can see I am setting what type of storage I am using by the type parameters, storage_provider here is Azure. We must also provide the tenant id where the service is going to check the Azure IAM role assigned to the Snowflake instance. Finally, we provision which parts of the blob store we would like the integration to access. We can also block certain areas from being accessed as well. This allows for greater control over the areas that an external stage can be created.

Once this is run, Snowflake will create a service principal in your Azure account that you can give reader access. After this is done, all you have to do is grant usage to other roles in Snowflake.

NB: This must be done from the Account Admin level as this is the only role allowed to create and give permissions to other roles within Snowflake using Storage Integrations.

GRANT USAGE ON INTEGRATION <object_name> TO ROLE <role_name>

Now we can finally create the external stage; here is the code to create the external stage using the storage integration:

CREATE [ OR REPLACE ] [ TEMPORARY ] STAGE [ IF NOT EXISTS ] <external_stage_name>
STORAGE_INTEGRATION = integration_name
[ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] ) } ]
[ COPY_OPTIONS = ( copyOptions ) ]
[ COMMENT = '<string_literal>' ]

Start Hitting the Easy Button with Snowflake Storage Integration!

That’s it, folks! The advantages of creating an external stage with storage integration in Snowflake are hard to understate. Developers are no longer responsible for generating a SAS token for each external stage they create.

Ready To Accelerate Your Digital Transformation?

At Hashmap, we work with our clients to build better, together.

If you’d like additional assistance in this area, Hashmap offers a range of enablement workshops and consulting service packages as part of our consulting service offerings, and would be glad to work through your specifics in this area.

How does Snowflake compare to other data platforms? Our technical experts have implemented over 250 cloud/data projects in the last 3 years and conducted unbiased, detailed analyses across 34 business and technical dimensions, ranking each cloud data platform.

More Tools and Content For You

Kieran Healey is a Cloud and Data Engineer with Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

--

--

Kieran Healey
Hashmap, an NTT DATA Company

Full Stack Data Guy — likes blogging about new technologies and sharing simple tutorials to explain the tech.