Migrating historical and real-time data from AWS S3 to Snowflake

Dipan Saha
5 min readApr 3, 2023

Snowflake is a cloud-based data warehouse that offers a variety of features for data storage, transformation, and analysis. One of the benefits of using Snowflake is that it can be easily integrated with other AWS services, such as Amazon S3. This integration makes it possible to migrate data from S3 to Snowflake using -

  • The COPY command — For Historical data migration
  • A Snowpipe — Realtime data ingestion

In this blog, we shall follow the below steps to migrate data from AWS S3 to Snowflake.

  • Create a Snowflake account (a new user is eligible to create a free account and try it out for 30 days)

Load existing data (One time migration)

  • Create a S3 bucket. Then create a historical folder where you want to place the existing historical data. Upload a few files in that folder.
  • Create a table on Snowflake to store the S3 data
  • Define a Snowflake external stage which will point this new S3 bucket
  • Use the COPY command to copy the existing S3 files to your new Snowflake table

Load realtime data

  • Create another folder inside your S3 bucket where you want to ingest realtime data. Alternatively, you can create a separate S3 bucket also.
  • Define a Snowflake external stage which will point to this S3 location
  • Define a Snowpipe
  • Create an event notification on that bucket to publish messages to SQS as soon as new files arrive on that bucket
  • Upload a file and see if Snowpipe is able to copy the data in realtime

Create a Free Account on Snowflake

One of the benefits of using Snowflake is that it is free to use for up to 30 days.

Benefits of Creating a Free Account on Snowflake

There are several benefits to creating a free account on Snowflake.

  • First, it allows you to try out Snowflake before you commit to a paid subscription. This is a great way to see if Snowflake is the right data warehouse for your business.
  • Second, a free account on Snowflake gives you access to all of the features of Snowflake. This includes the ability to store and process data, run queries, and create dashboards.
  • Third, a free account on Snowflake is a great way to learn about Snowflake. You can use the free trial to explore the Snowflake platform and learn how to use it.

Provide your information here to create a free snowflake account

Next, you will get an email to activate your account. Once you do so, create an user id and password to login to your account.

Load historical data to Snowflake

Create a S3 bucket & upload a few files (Historical data)

The following CSV files contain 6 columns — First Name, Last Name, Address, City, State, Zip

Create a table on Snowflake to store the S3 data

Create a new worksheet and execute the following queries.

Define a Snowflake external stage which will point this new S3 location

Acquire the credentials from your account

Execute the following query to define a stage. Provide your aws_key and aws_secret here.

CREATE OR REPLACE STAGE "my_s3_stage"
URL = 's3://neopiu/historical-data/'
CREDENTIALS=(AWS_KEY_ID='<AWS_KEY>' AWS_SECRET_KEY='<AWS_SECRET>')
FILE_FORMAT = (TYPE = 'CSV');

Use the COPY command to copy the historical S3 files to your new Snowflake table

Load realtime data

Create another folder inside your S3 bucket where you want to ingest realtime data

Define a Snowflake external stage which will point to this S3 location

CREATE OR REPLACE STAGE my_s3_realtime_stage
URL = 's3://neopiu/realtime-data/'
CREDENTIALS=(AWS_KEY_ID='<>' AWS_SECRET_KEY='<>')
FILE_FORMAT = (TYPE = 'CSV');

Define a Snowpipe

CREATE OR REPLACE PIPE snowpipe_daily AUTO_INGEST=TRUE AS
COPY INTO SOURCE_DATA
FROM @my_s3_realtime_stage
FILE_FORMAT = (TYPE = 'CSV')
ON_ERROR = 'CONTINUE';

Use the following query to display the notification channel for the snowpipe. Take a note of this.

Create an event notification on that bucket to publish messages to SQS as soon as new files arrive on that bucket

Under the destination section, provide the notification channel name under the SQS queue name.

Your realtime data ingestion setup is now complete.

Upload a file and see if Snowpipe is able to copy the data in realtime

Upload a couple of files under the realtime-data bucket. Ensure that the structure matches the destination table schema.

The data should flow to your destination table within a couple of minutes.

Note! The lag time is determined by a number of factors, including the size of the data set, the frequency of ingestion, and the performance of the Snowflake and S3 infrastructure.

Conclusion

In this blog post, we have walked you through the steps involved in using the COPY command and a snowpipe to migrate data from S3 to Snowflake. We have also discussed the prerequisites for using the COPY command, as well as the steps involved in creating a Snowflake table, an S3 bucket, and a Snowpipe stage. We hope that this blog post has been helpful.

--

--

Dipan Saha

Cloud Architect (Certified GCP Professional Architect & Snowflake Core Pro)