Snowflake: Dynamic Unload Path (Copy Into Location)

David A Spezia
BigDataDave
Published in
4 min readMar 21, 2020

Sometimes the need arises to unload data from Snowflake tables and objects into a stage (S3 Bucket, Azure Container or GCS) to support downstream processes. These processes are typically better served by using a SQL client or integration over Python, .Net, Java, etc to directly query Snowflake. However, sometimes we need to support file dependent processes, messaging services or just support legacy integrations requiring files to be unloaded. This is easily done in Snowflake with a statement like “Copy Into @BLOBLocation from Object”. Please see the Snowflake documentation for copy into location and documentation for data unloading for a grounding on the topic, and syntax we are investigating here. The @BLOBLocation variable is a string literal and would more fit for purpose if we could dynamically create the path or filenames.

I was working on a project for one of my large clients in the Bay, and this need for dynamic naming of unloaded files came up. I spent some time creating a custom demo for the client by dynamically naming the unloaded directory with a JavaScript Stored Procedure in Snowflake. I have seen this question asked a few times in forums and around the web. Time to document how to achieve this here and some other paths you might think work, but unfortunately do not.

Create Unload Objects
I am going to use a database most Snowflake accounts have as the account creation process from Snowflake adds these for posterity. I will use DEMO.PUBLIC for the schema, create a stage named UNLOAD and a JSON file format for use during unloading the data. We will be playing with the STORE table from SNOWFLAKE_SAMPLE_DATA as all Snowflake accounts have this data shared with them, but you can use any table just change the FROM clauses in the SQL.

--ContextUSE ROLE ACCOUNTADMIN;USE DATABASE DEMO;USE SCHEMA PUBLIC;USE WAREHOUSE LOAD_WH;--1,500 Record Sample Data TableSELECT COUNT(*) FROM "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."STORE";--Create StageCREATE OR REPLACE STAGE UNLOAD;LIST @UNLOAD;--Create File Format JSONCREATE or REPLACE FILE FORMAT JSONTYPE = 'JSON'COMPRESSION = 'AUTO'ENABLE_OCTAL = FALSEALLOW_DUPLICATE = TRUESTRIP_OUTER_ARRAY = FALSESTRIP_NULL_VALUES = FALSEIGNORE_UTF8_ERRORS = FALSE;

Testing the Unload Command
I like to code out and test my SQL before committing it to a JavaScript Stored Procedure. Here we are unloading to the @UNLOAD stage, with the format of JSON. The list command will show the files unload, and the select $1 from will query on the document directly on BLOB.

--Unload JSON to Stage (COPY INTO OBJECT_CONSTRUCT(*))COPY INTO @UNLOAD FROM (SELECT OBJECT_CONSTRUCT(*)
FROM "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."STORE")
FILE_FORMAT=(FORMAT_NAME='JSON') MAX_FILE_SIZE=1024 OVERWRITE=TRUE;LIST @UNLOAD;SELECT $1 FROM @UNLOAD (FILE_FORMAT => JSON) LIMIT 1;

Can’t You Just Use Identifier()?
Spoiler, no. I did try it out, and a few other dynamic string methods directly in the SQL. I kept getting reference errors even with UDFs and session variables. I got fancy with it, still nothing. We will come back to the cascading replace and UDF_NOWASSTRING() functions. The return Date.now().toString() (line 15) will prove most useful later.

--Session Variable Dynamic PathingSET SUFFIX = (SELECT LEFT(REPLACE(REPLACE(REPLACE(REPLACE(CURRENT_TIMESTAMP::STRING,' '),':'),'-'),'.'),17));SELECT $SUFFIX;SET PATH = '@UNLOAD/' || $SUFFIX || '/';SELECT $PATH;COPY INTO IDENTIFIER($PATH) FROM (SELECT OBJECT_CONSTRUCT(*) FROM "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."STORE")FILE_FORMAT=(FORMAT_NAME='JSON') MAX_FILE_SIZE=1024 OVERWRITE=TRUE;--JS UDF Now() as a StringCREATE OR REPLACE FUNCTION UDF_NOWASSTRING()RETURNS STRINGLANGUAGE JAVASCRIPTAS$$return Date.now().toString();$$;SELECT UDF_NOWASSTRING();SELECT UDF_NOWASSTRING()::TIMESTAMP_LTZ(3);SELECT TO_TIMESTAMP(SELECT UDF_NOWASSTRING());SELECT LEFT(REPLACE(REPLACE(REPLACE(REPLACE(TO_TIMESTAMP(UDF_NOWASSTRING())::STRING,' '),':'),'-'),'.'),14);

Using JavaScript Stored Procedures
With JSSPs completely dynamic SQL can be written and executed. Please see the Snowflake documentation for more information and a grounding in Snowflake JSSPs. We combine OBJECT_CONSTRUCT() with COPY INTO LOCATION to get a JSON payload. With using the code var dynamicpath = Date.now().toString(); var sql00 = `copy into @unload/` + dynamicpath + `\ (line 19) we can use a string representation of the Java epoch milliseconds in the path. We also leveraged a ternary operator `? :` (line 32) for IF ELSE in one line avoiding 0 row unloads breaking our code.

Now testing and calling the Stored Procedure is a breeze.

--Call SPCALL SP_UNLOAD_DYNAMIC();--Inspect ResultsLIST @UNLOAD;--Clean Up After YourselfREMOVE @UNLOAD;

JavaScript Stored Procedure Taking a Dynamic Input
You might not always want the Java epoch in the Path. I wanted to reuse my session variable and UDF from early experiments when calling the JSSP. Easily coded into the JSSP for making things fully dynamic and callable as you see fit with constants, session variables or any constructed string.

Conclusion
Now you can go out and create dynamically named subdirectories to your heart’s content when unloading data. This logic can also be used to create dynamic names for files other than data_0_0_0.ext.compression too. Remember to clean up your unloaded files after use with a remove command to avoid storage growth. Happy Querying!

Originally published at http://bigdatadave.com on March 21, 2020.

--

--

David A Spezia
BigDataDave

David spends his time working with Web-Scale Data Applications built on Snowflake with the largest tenants on the Snowflake Data Cloud.