Snowflake: List files from Stage with Prefix or Starts With
Have you ever wondered how to get back a list of files with a prefix from a Snowflake Stage? I did, and then I had a customer ask me last week. I hacked and hacked for a few hours trying to get this answer, searched the web and came up Null.
Snowflake does have a LIST command as documented here: https://docs.snowflake.net/manuals/sql-reference/sql/list.html It works well, but sometimes you need better performance from large cloud buckets or to match a desired prefix on your files.
I tried 42 different Pattern = ‘*XXXX*’ parameters. I kept coming up with a pattern matching a contains function or nothing returned.
Then I had an idea, just put my desired prefix directly in the path…
-- SnowSQL for List Prefix
-- Create stage from public facing S3 bucket
CREATE OR REPLACE STAGE TABLEAU_JSON
URL = 's3://snowflake-workshop-lab/VHOL_Analytics_Tableau/';-- List with Prefix
-- All Filez
LIST @TABLEAU_JSON;-- I suspect this is a contains
LIST @TABLEAU_JSON pattern = '.*.nat.*';-- It is a contains LIST
LIST @TABLEAU_JSON pattern = '.*.viz.*';-- 100% contains function
LIST @TABLEAU_JSON pattern = '.*.a.*';-- Fail
LIST @TABLEAU_JSON pattern = 'nat.*';-- OMG Yes! prefix is path, path is prefix...-- Confirmed
LIST @TABLEAU_JSON/nat; LIST @TABLEAU_JSON/Tab;
In good news you should see an update to Snowflake’s amazing documentation reflecting this syntax example soon.
Note the path and prefix input string is case sensitive. @TABLEAU_JSON/tab will return no files, where @TABLEAU_JSON/Tab will return two files from the above sample code.
This prefix method also works with directory constructs in your cloud buckets. I advise most clients loading large amounts of data to snowflake to create cloud buckets with the following pattern.
-- Best Practice Staging Pattern @STAGE/[SchemaName]/[TableName]/[Year]/[Month]/[Day]/<<[Hour]>>
With this method you can COPY INTO or LIST by Table, Table+Year or target Months and specific Days for loading. The following command would address all Days and Hours of files under the Month specified.
-- List files for all Days and Hours under a Month LIST @STAGE/Schema1/table1/2020/02/
Happy SnowSQL coding out there!
Originally published at http://bigdatadave.com on February 3, 2020.