Snowflake Cliff Notes: Internal/External Stage

(This is one of a series of quick read articles outlining some functionality in Snowflake.)

Zakary LeBlanc
6 min readApr 25, 2023
Photo by Kelly Sikkema on Unsplash

When starting out, it can be easy to overlook how central Stages are to getting data into and out of Snowflake. A Stage in Snowflake is a landing area that stores files to be used to load data into or unload data from Snowflake tables.

If you are interested in learning more about how to move data from files into a Snowflake table or vice versa, take a look at this article I recently published discussing the COPY INTO command.

Stages come in two flavors — internal and external. Let’s take a closer look.

Internal and External Stages

An internal stage is a stage that is hosted by Snowflake. Snowflake supports three types of internal stages: User, Table, and Named Stages. By default, each user and table is allocated an internal stage. You do not need to create these stages as they’re always available if needed, but each has its own benefits and drawbacks. The following information is from Snowflake’s documentation here.

A User Stage can be accessed using the @~ shorthand. For example, you can use LIST @~; to display all files in the stage. However, these stages do not support setting File Format options, so they must be specified when loading/unloading using the COPY INTO command. A User Stage is not appropriate if multiple users require access to the same files, or if the current user does not have INSERT privileges on the target tables.

Each table has a stage that shares the same name, and like User Stages, these cannot be dropped. A Table Stage can be accessed using @%my_table. For example, you can use LIST @%my_table; to list all files in the table stage. Table stages do not support transformations when loading data (using a SELECT statement as the source for the COPY command). This Table Stage is not a separate object, but is rather implicitly tied to the table itself. This option would not be appropriate if you need to copy data from the files into multiple tables.

Lastly, a Named Stage is a database object that provides the greatest degree of flexibility for data loading. It requires access to be granted to users to use, as well as appropriate access privileges to target tables where the data will be loaded. This type of stage only requires a single SQL statement to provision, which can be created using the simplest form of the CREATE statement, as shown in the Snowflake documentation (shown below).

CREATE STAGE MY_INTERNAL_STAGE;

And that’s all you need to get started.

Another option is to use Snowflake’s web ui to create the Stage.

So far, we have only discussed internal stages, but Snowflake also supports integrations with storage containers from Amazon, Azure, and GCP. These integrations come in the form of external stages.

To create an external stage in Snowflake, you need to use the same CREATE STAGE command as for internal stages, but you also need to provide the URL of the external stage and any necessary credentials. You can store the connection credentials in a Storage Integration and reference them when creating the stage.

Setting up an external stage requires an additional step of provisioning the container with the cloud provider before creating the external stage in Snowflake. We highly recommend creating your storage container within the same cloud provider and region as your Snowflake account to avoid expensive egress fees when loading or unloading data.

Here are some examples of CREATE STAGE statements for external stages:

– Amazon S3
CREATE STAGE my_ext_stage
URL='s3://load/files/'
STORAGE_INTEGRATION = myint;

CREATE STAGE my_ext_stage1
URL='s3://load/files/'
CREDENTIALS=(AWS_KEY_ID='1a2b3c' AWS_SECRET_KEY='4x5y6z');

– Azure
CREATE STAGE my_ext_stage
URL='azure://myaccount.blob.core.windows.net/load/files/'
STORAGE_INTEGRATION = myint;

CREATE STAGE mystage
URL='azure://myaccount.blob.core.windows.net/mycontainer/files/'
CREDENTIALS=(AZURE_SAS_TOKEN='?sv=2016-05-31&ss=b&srt=sco&sp=rwdl&se=2018-06-27T10:05:50Z&st=2017-06-27T02:05:50Z&spr=https,http&sig=bgqQwoXwxzuD2GJfagRg7VOS8hzNr3QLT7rhS8OFRLQ%3D');


– Google Cloud Storage
CREATE STAGE my_ext_stage
URL='gcs://load/files/'
STORAGE_INTEGRATION = myint;

Note: The above were taken straight from Snowflake’s documentation. None of the keys above are valid and are for illustrative purposes only.

Putting Files On the Stage

After preparing our preferred Stage and file for import, the next step is to transfer the file from your local desktop to the internal stage. There are a couple of ways to do this, but I find the easiest method is to download the SnowSQL CLI. You can find instructions on how to download and install SnowSQL at this link. Alternatively, you can use the Snowflake Connector for Python, but I will focus on SnowSQL for this article.

After installing SnowSQL, I recommend updating your connection profile in the SnowSQL config file located in the following locations:

Linux/macOS: ~/.snowsql/

Windows: %USERPROFILE%\.snowsql\

Open the file and update the connections section to specify the default connection parameters. Alternatively, you can add a named connection profile by adding the following:

[connections.my_example_connection]
accountname = myorganization-myaccount
username = jsmith
password = xxxxxxxxxxxxxxxxxxxx
dbname = mydb
schemaname = public
warehousename = mywh

With this specified you can run this command to push your file up to your internal stage:

snowsql -c my_example_connection “PUT ‘file://C:\temp\data\mydata.csv’ @MY_INTERNAL_STAGE AUTO_COMPRESS=TRUE;”

After running this code you can verify that the file is in the stage by using the LIST (or LS) command within Snowflake.

LS @MY_INTERNAL_STAGE;

As a side note, you can also specify folders within the Internal Stage by appending the file path to the stage name like so:

PUT ‘file://C:\temp\data\mydata.csv’ @MY_INTERNAL_STAGE/DATA_IMPORTS/ AUTO_COMPRESS = TRUE;

This can be useful if you have multiple files to upload that will be loaded into the same target table (or you could choose to load them into the table’s stage). Snowflake will load your files to the stage, and you can then choose to load them into the target table.

Other Uses For Stages

We have covered the most common uses for stages, but this is only the tip of the iceberg. Snowflake also allows you to query a file or set of files that match a regex pattern.

For example:

SELECT t1.$1, t1.$2, t1.$3
FROM @MY_INTERNAL_STAGE/data.csv (FILE_FORMAT => ‘DEV_STAGE.INTAKE.CSV’ PATTERN = ‘.*data.csv’’) t1;

This can be very useful for checking the contents of files before loading them or after unloading them. Similarly, it is possible to create an EXTERNAL TABLE, which is a table object that points to a file or set of files in the stage. The difference between querying the files directly and using an external table is that the external table exists as an object in the database. Other users can also query this table.

Additionally, new files can be added to the stage directory that the external table is pointed towards, and they can then be incorporated into the external table with a simple refresh command. External stages open up interesting possibilities, as other programs can interact with the same files.

There are some limitations with external tables and querying data in files. The primary limitation is that this access is read-only. Snowflake has a new feature in the works called Iceberg Tables (using Apache Iceberg technology). An Iceberg table is similar to an external table in that the data is housed in files on your stage. The difference is that these tables will be full-fledged Snowflake tables as well. The data for the table will be stored in Parquet format on your external stage, but the Iceberg functionality allows for DML commands to be used to insert, update, and delete data as well. This feature is currently in private preview, but hopefully, it will be released for general preview access soon. Check out this video from Snowflake if you want to dive deeper into it.

Conclusion

Stages are a central part to any data pipeline in Snowflake. This article has briefly examined how to create, push files into, and query data within an internal or external table. With the inclusion of Iceberg tables coming in the near future, Snowflake is greatly increasing the utility of Stages. These advances provide us with new design opportunities and challenges when building out our Snowflake environments. I am looking forward to it!

Thank you for reading! Please leave some feedback and follow if you enjoy talking about Snowflake or SQL topics.

--

--

Zakary LeBlanc

Bloom Analytics founder and chief data architect. Enthusiastic about modern data platforms - especially Snowflake and passionate about writing good SQL.