Connecting to external stage in Snowflake — AWS S3 bucket

Introduction

ksshravan
2 min readApr 7, 2023

Stage in Snowflake is a place where file is temporarily stored (and data transformed into tabular form by using info regarding the file’s format) before it is loaded into a database table. While Snowflake has internal stages, many a times, external stages are very useful. Some reasons to use external stage over an internal stage include:

  • External stages are easier to manage because of their intuitive user interface whereas for internal stages user need to use Snowsql CLI and run multiple commands like PUT and LIST
  • Auto archival is easily possible in external stage, whereas for an internal stage, we need to write a stored procedure to do the same
  • Auditing data is much easier in an external stage (give example)

Steps Involved

With this brief overview of how external stages are useful, lets now see the steps to connect to an AWS S3 external stage in Snowflake

1. Create a free AWS account
2. Create an S3 bucket
- Give bucket name
- Choose AWS region
3. Once bucket created, select bucket and select on Create folder. In this example, we create folder named csv to store csv files
4. Upload csv files using Upload button in created folder
5. Search for IAM
6. Under Access management, click on Roles
7. Click on Create Role, select AWS Account as Entity type, and choose This account as AWS account, and click on Next
8. Select following permission policies that will be attached to this role — AmazonS3FullAccess, click Next
9. Give role name for exmaple asw_s3_snowflake_intg, and give any description. Then click on Create Role
10. You would see a new role created, in this example, aws_s3_snowflake_intg. Click on this newly created role
11. Copy ARN number, which is available at top (for example arn:aws:iam:999999999:role/<role_name>)
12. Now we have to create a Storage Integration object in Snowflake. To do that go to Snowflake and run the following code. This does one way integration. While Snowflake knows about AWS, AWS is yet to know about Snowflake

create or replace storage integration s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::555064756008:role/aws_s3_snowflake_intg'
STORAGE_ALLOWED_LOCATIONS = ('s3://bucket-name/path/', 's3://bucket-name/path/')
COMMENT = 'Integration with aws s3 buckets' ;

13. Enter DESC integration s3_int in Snoflake. Using this we can see properties of integration object. Over there, we have to copy property_value of AWS_IAM_USER_ARN (looks like arn:aws:iam::09030299999:user/hpq20000)

14. Go to AWS and open the role. Click on Edit Trust pPolicy- For ‘AWS’ key paste the ARN value given by storage integration object

15. Now go to Snowflake and create the external stage

CREATE STAGE my_s3_stage
STORAGE_INTEGRATION = s3_int
URL = 's3://bucket1/path1/'
FILE_FORMAT = my_csv_format;

You can now load data from this stage into a table and start working on it

If you want to see a video version of this, visit the first link in the references

Reference

[1] https://www.youtube.com/watch?v=e6HjXTZYQWQ — Snowflake AWS Integration

[2] https://cloudyard.in/2022/01/snowflake-internal-stage-and-external-stage

[3] https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration

--

--