Getting Your CSV Data into Snowflake

How to Create Databases, Tables, Stages and Load Data from Local Storage into Snowflake Using the SnowSQL CLI Tool

Harpreet Sahota
Pachyderm Community Blog
6 min readJun 28, 2022

--

Photo by Mika Baumeister on Unsplash

In order to get your data into Snowflake, you’re going to need to figure out how to do a few things.

I’m Harpreet, a data scientist and machine learning practitioner who is a bit late to hop on the Snowflake hype train. In this post I’ll show you how to get your CSV data into a table in Snowflake. These are the lessons I learned while working on an integration with Pachyderm and Snowflake.

By the end of this post you’ll be able to use SnowSQL (Snowflake’s CLI client) to upload a CSV file from your local machine into Snowflake in three simple steps:

  • Define a table
  • Stage files
  • Copy from the stage into your table

Let’s get into it.

Photo by Luke Southern on Unsplash

Prerequisites

You can safely skip these steps if you’ve already set up a Snowflake account and know how to log in through SnowSQL. Go ahead and scroll down to Step 0: Define a Table.

The assumption here is that you have the following:

If you don’t have these set up, click the links above and follow the Snowflake documentation. As soon as you’ve created your Snowflake account you’ll be brought to a sign-in page that looks something like this:

Notice the URL in your browser. It should read something like:

https://vl05740.ca-central-1.aws.snowflakecomputing.com/<a bunch of other stuff>

The vl05740.ca-central-1.aws part of that URL is your account identifier. Copy and paste this into a text editor. You’ll need it to log into your account through SnowSQL.

Here’s how you do that:

Logging into your Snowflake account with SnowSQL

You’ll be prompted to enter your password. Once you’re logged in you’ll see something that looks like:

harpreetsahota#COMPUTE_WH@(no database).(no schema)>

Congratulations, you’re in! You can create a database using the following command:

harpreetsahota#COMPUTE_WH@EXAMPLE_DB.PUBLIC> create database example_db;

You should see a status as below:

How to create a database using SnowSQL

You can also verify that the database was created by logging into Snowsight, which you can do by going to:

<your-account identifier>.snowflakecomputing.com

All that’s left now is to download a CSV.

You can use this one which from the epic Machine Learning Mastery website by Jason Brownlee. Information about the what the column values represent, as well as their names, can be found here.

Now let’s get on with the rest of the show!

Step 1: Create a Table using SnowSQL

Photo by Caspar Camille Rubin on Unsplash

First thing you have to do is create a table, which you can do right from the command line using SnowSQL.

How to create a table using SnowSQL

Snowflake supports a variety of data types. You’ll notice that I used the decimal data type (which is a subset of the numeric data type). There’s a blog post on the Snowflake community forum which discusses whether you should use float or not.

Here’s my takeaway from it:

  • Number is a fixed-point exact numeric values. This includes data types like number, integer, and decimal. In the real-world you’d come across this type of data in the form of natural numbers and exact decimal values, for example monetary figures, which need to be stored precisely.
  • Floats are floating-point numbers, which are approximate representations of numeric values. You’ll usually see these used in mathematics and science to simplify the calculations with scientific notation. So if you’ve got columns that require you to store numbers with major differences in magnitude, then this is the right choice for you.

Taking a look at our data it’s clear that the decimal (which is equivalent to number) data type is a good choice for us.

Now that you’ve created the table, you can move our CSV to staging.

Step 2: Stage and Put Files Using SnowSQL

Before you can ingest your CSV into your table in Snowflake, you’ve got to load it into a stage.

A Snowflake stage is a location where data is stored so that it can be loaded into a table. There are two main types of stages in Snowflake: internal stages and external stages. Internal stage is used when data is stored inside Snowflake. External stage is used when data is stored in object storage outside of Snowflake, such as AWS or GCP. You’ll use an internal stage for this example.

Snowflake offers three types of internal stages: User, Table, and Named.

Files in a user stage can only be accessed by a single user, making this the appropriate choice when you need to load multiple tables from one specific user. Because each table in Snowflake has a stage automatically allocated to it, the table stage is an appropriate option when you have files that need to be accessed by multiple users but copied into a single table. If you need flexibility or plan on regular data loads that could potentially involve multiple users then a named stage is the right option for you.

Check out the Snowflake documentation for more information on choosing an internal stage for local files.

Let’s go ahead and create the stage using the following syntax.

create or replace stage <stage-name>
file_format = (type = 'CSV' field_delimiter = ',');

If you have a CSV file that has headers (the wheat seeds CSV does not), you would edit the file_format argument as follows:

file_format = (type = 'CSV' field_delimiter = ',' skip_header = 1);
How to create a stage in Snowflake

Once the stage is created you can move files from your local machine to Snowflake staging. Local data files can be uploaded into an internal stage by executing the PUT command. There are three ways to execute this command, depending on the type of stage you want to create.

For a User Stage use the following syntax:

put file:///<path-to-file>/<filename>.csv @~/<stage-name>;

For a Table Stage use the following syntax:

put file:///<path-to-file>/<filename>.csv @%<stage-name>;

For a Named Stage use the following syntax:

put file:///<path-to-file>/<filename>.csv @<stage-name>;

You’ll use a Named Stage for this example.

Uploading local file into a Named Stage in Snowflake using SnowSQL

And now you can FINALLY get your data into the table you created in Step 1.

Step 3: Copy into Table

With your table created and staging area prepped with uploaded data, you can finally populate your table.

Run the following command to copy your staged data into your table:

copy into wheat_data 
from @wheat_data files = ('wheat-seeds.csv.gz');

You should see the following output after executing the command:

Congrats! You’ve officially taken a CSV from your local computer and got it into Snowflake. You can verify that your data was put in the table by checking out the table in Snowsight or by doing a select * from.

Be sure to give this post a few claps to help others find it and bookmark it in case you need to come back to it for reference at a later time!

By the way, be sure to checkout this blog from Pachyderm to learn more about how the integration that allows you to build data-driven, language agnostic pipelines with data and pipeline versioning.

--

--

Harpreet Sahota
Pachyderm Community Blog

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51