Snowflake — Loading & Unloading of Data (Part 1)

Divyansh Saxena
7 min readFeb 14, 2022

--

Today we will be discussing more on different stages in snowflake, data loading, and unloading process in snowflake with AWS, Microsoft Azure, Google Cloud Storage, and within its own Internal Storage. Since the topic of Loading and Unloading in snowflake is vast, we will be taking up this section into 2 parts.

Photo by Alexander Sinn on Unsplash

Executive Summary

  1. Stages In Snowflake
  2. External Vs Internal Stage
  3. Types of Snowflake Internal Stages
  4. Data Loading With Internal Stage
  5. Data Unloading With Internal Stage

Snowflake Stages

The staging area in snowflake is a blob storage area where you load all your raw files before loading them into the snowflake database. Snowflake stage is not a data warehouse stage.

Snowflake provides an internal stage area as well as an external stage area. The external stage area includes Microsoft Azure Blob storage, Amazon AWS S3, and Google Cloud Storage.

External Stage Area Available in Snowflake

In the external stage area, we store our raw files in the blob storage provided by Microsoft, Google, and Amazon. In order to make a connection between the external staging area and the snowflake, we use a stage object. But this stage object will not be secure since the Stage object describes how to load data from the external staging area and in order to make it secure, we use the Integration object.

After this, we also require a file format which describes the file properties. All these objects are used by the COPY command to load data.

Data Loading Process from External Stage

In the internal stage area, the process will remain almost the same, the only difference will be that the integration object will not be required. This is because the staging area will be provided by Snowflake and the stage object will be sufficient to provide the secure connection.

Data Loading Process From External Stage

Types of Internal Stages

We have 3 types of internal stages in the Snowflake.

  1. User Stage
  2. Table Stage
  3. Named Stage

User Stages — Each User is allocated to them by default for storing the files. This is the best-suited option if your files will be accessed by only one user and are required in multiple tables. The user would require access to load the data from its stage to the tables.

Table Stages — Each Table in the Snowflake is allocated with a default file storing stage. This is the best-suited if your files need to be accessed by multiple users and are required/copied into one table. Unlike the Named stages, table stages can’t be altered or dropped. It does not support creating the file format inside this stage, so we need to explicitly define the File Format in the COPY INTO <table> Command.

Table stages also do not support the transforming data while loading it.

Named Stages — The Named stages provide the greatest degree of flexibility for data loading. Because they are database objects, the security/ access rules that apply to the database, will apply the same to its objects as well. Users with appropriate privileges on the Named stage can be load data into any table within the database.

Data Loading with Snowflake Stage

With Internal Stage

Using the internal table stage, we will understand how to load the data into Snowflake. For that, we would require to have a prerequisite installation of the SnowSQL.

SnowSQL is the next-generation command-line client for connecting to Snowflake to execute SQL queries and perform all DDL and DML operations, including loading data into and unloading data out of database tables.

You can install the SnowSQL directly from your dashboard by navigating to the Help → Download. The reference image below shows the path to download the SnowSQL. Once you download the SnowSQL, you would require to open your terminal and use the snowsql command to connect with your snowflake account.

Terminal> snowsql -a ‘snowflake_account_name’

Once we are logged into the snowflake account using the SnowSQL command line, we can use the PUT command to load the raw files into our staging areas. The Put command uploads (i.e. stages) data files from a local directory/folder on a client machine to Snowflake stages.

I used the following syntax to load the files residing in my local system to the database table stage.

put file://<path_of_file>\<file_name> @DatabaseName.Schema.%TableName

I did run the select query on my table after loading the data into the stage, and the table is still empty, which is accurate because we haven’t loaded the data into our main table. I ran the List @%EMPL_BASIC_1 command to see the list of all the files which are residing inside my table stage (EMPL_BASIC_1).

In order to load the data from the table staging area to the table, we would require to make use of the copy command. The COPY command is a great command which can be used to easily copy data into the tables with ease. I have added the snapshot of all the commands which I ran in the below image.

Command Line Executions for Data Loading in Table Stage

Not only this, once you load the files to the stage, you can make use of Web UI for loading the data into tables.

Web UI to Load Data From Table Stage

When I tried using the stage of a different table, it does not allow me to use it since table stages are only used by the same tables.

Error While Accessing Different Table Stage

In the above example, you can also see that I have used the placeholders (t.$1, t.$2 .. so on) while loading the data from staging to specific table columns. This helps us to map the columns of the raw file with the table columns easily.

Additionally, when you try to load the multiple files to the stage and for debugging purposes wanted to see the row number of data added by each file, you can make use of metadata parameter values.

Example of Metadata Parameters in COPY Stage Process

If you don’t want to keep the stage path in the metadata$filename parameter and only wanted to keep the file name, you can make use of the SPLIT_PART() function of Snowflake.

SPLIT_PART() Function in Snowflake

The COPY Command does not process the already loaded file as it keeps the md5 hash record of files it loads.

COPY Command Keeping Track By MD5 Hash Records

Data Unloading with Snowflake Database

With Internal Stage

The data unloading process is as same as the Data loading process. We can make use of the COPY command to load the data from the table to its stage. Once we unload the data in the stage, we can download it into our local machine using the GET command.

Let’s take the example of unloading data to a local machine system. I run two COPY Command queries to generate two raw files, one with the entire table data and the other with only two-column data of the table. After running those queries, when I checked for the table stage area, the two files got generated.

COPY Command From Table to Table Staging Area

Now I downloaded those files into my local system using the GET command in the command line using the SnowSQL.

A download of Raw File from Stage Area to Local System

We can use OVERWRITE = TRUE in the COPY Statement in order to overwrite a file which is already existing in the table stage.

Looking to know more about the syllabus and guidance on Snowflake Snow-Pro Certification? Visit my article on “Roadmap To Complete Snow-Pro Certificate

Are you new to snowflake or planning to know more about snowflake architecture and its basics? Read the below article to know more about Snowflake Architecture, its caching layers, the concept of clustering, and virtual warehouses in just 10 mins.

Want to have a quick guide for Performance Tuning and DB Characteristics? Follow the below blog!

About me

I am Divyansh Saxena, a Cloud Data Engineer working at IBM. I am a data enthusiast and have knowledge in multiple cloud platforms such as Microsoft Azure, Amazon Web Services, Snowflake, and Google Cloud Storage.

If you like the blog, please share your views and visit https://beacons.ai/data.dude to know more about me ;)

Follow my Medium Channel for regular updates on similar topics.

--

--