BigDataDave
Published in

BigDataDave

Snowflake: List files from Stage with Prefix or Starts With

Have you ever wondered how to get back a list of files with a prefix from a Snowflake Stage? I did, and then I had a customer ask me last week. I hacked and hacked for a few hours trying to get this answer, searched the web and came up Null.

Snowflake does have a LIST command as documented here: https://docs.snowflake.net/manuals/sql-reference/sql/list.html It works well, but sometimes you need better performance from large cloud buckets or to match a desired prefix on your files.

I tried 42 different Pattern = ‘*XXXX*’ parameters. I kept coming up with a pattern matching a contains function or nothing returned.

Then I had an idea, just put my desired prefix directly in the path…

-- SnowSQL for List Prefix
-- Create stage from public facing S3 bucket
CREATE OR REPLACE STAGE TABLEAU_JSON
URL = 's3://snowflake-workshop-lab/VHOL_Analytics_Tableau/';
-- List with Prefix
-- All Filez
LIST @TABLEAU_JSON;
-- I suspect this is a contains
LIST @TABLEAU_JSON pattern = '.*.nat.*';
-- It is a contains LIST
LIST @TABLEAU_JSON pattern = '.*.viz.*';
-- 100% contains function
LIST @TABLEAU_JSON pattern = '.*.a.*';
-- Fail
LIST @TABLEAU_JSON pattern = 'nat.*';
-- OMG Yes! prefix is path, path is prefix...-- Confirmed
LIST @TABLEAU_JSON/nat; LIST @TABLEAU_JSON/Tab;

In good news you should see an update to Snowflake’s amazing documentation reflecting this syntax example soon.

Note the path and prefix input string is case sensitive. @TABLEAU_JSON/tab will return no files, where @TABLEAU_JSON/Tab will return two files from the above sample code.

This prefix method also works with directory constructs in your cloud buckets. I advise most clients loading large amounts of data to snowflake to create cloud buckets with the following pattern.

-- Best Practice Staging Pattern @STAGE/[SchemaName]/[TableName]/[Year]/[Month]/[Day]/<<[Hour]>>

With this method you can COPY INTO or LIST by Table, Table+Year or target Months and specific Days for loading. The following command would address all Days and Hours of files under the Month specified.

-- List files for all Days and Hours under a Month LIST @STAGE/Schema1/table1/2020/02/

Happy SnowSQL coding out there!

Originally published at http://bigdatadave.com on February 3, 2020.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David A Spezia

David A Spezia

90 Followers

David spends his time working with Web-Scale Data Applications built on Snowflake with the largest tenants on the Snowflake Data Cloud.