We insert and update data directly in the Snowflake table from the blob stage without using an…

3 min readNov 13, 2023

We insert and update data directly in the Snowflake table from the blob stage without using an intermediate stage.

We can load data from the Azure blob stage to the Snowflake table without any intermediate stage and we can also update and insert in the target table means merge data. without any intermediate stage.

For this process, I have followed the below steps.

Step 1:- Create file format, for this demo, I am trying to load multiple JSON files.

CREATE OR REPLACE FILE FORMAT my_json_format
TYPE = JSON
STRIP_OUTER_ARRAY=TRUE
STRIP_NULL_VALUES=TRUE
ALLOW_DUPLICATE=FALSE
IGNORE_UTF8_ERRORS=TRUE

Step 2:- Create an external stage because every time we do not copy the blob stage path. for this need to create an integration stage and that name you can pass when you create an external stage.

externalStageParams (for Microsoft Azure) ::=
URL = ‘azure://<account>.blob.core.windows.net/<container>[/<path>/]’
[ { STORAGE_INTEGRATION = <integration_name> } | { CREDENTIALS = ( [ AZURE_SAS_TOKEN = ‘<string>’ ] ) } ]
[ ENCRYPTION = ( [ TYPE = ‘AZURE_CSE’ ] [ MASTER_KEY = ‘<string>’ ] | [ TYPE = ‘NONE’ ] ) ]

Step 3:- We need to create a stored procedure that merges code with the target table and directly calls the blob stage path (folder). The goal is to load all data under that folder.

CREATE OR REPLACE PROCEDURE TESTING()
RETURNS VARCHAR(500)
LANGUAGE SQL
EXECUTE AS OWNER
AS ‘BEGIN
MERGE INTO <target table> USING (
select
$1:Col1

,$1:col100
,$1:col101

from @external_stage/blob_folder/
(
file_format => JSON_FORMAT
);
ON target table.ID=source.id
WHEN MATCHED THEN UPDATE SET( source.col1=source.col1,
source.col2=target.col2
….
source.col100=target.col100)
WHEN NOT MATCHED THEN INSERT ( col1,col2….col100

) VALUES (
source.col1,
source.col2,
..
source.col100
);
END’;

In this approach, I have used this in my one project and tried to process a 500 million record with 398 columns and use an x-small warehouse.

Advantage:- To optimize performance and storage space, we have adopted an approach where we avoid storing data in the intermediate stage. Storing 500 million records in the copy stage takes around 2 hours in an x-small warehouse and requires significant storage space. Instead, we directly read data from the blob stage and merge it with the target table. If a new record is found, we insert it, otherwise, we update the record. We are able to complete all process within 2hour. that means you are saved compute cost with storage cost.

Disadvantage:-Processing a stored procedure for the first time can optimize your computer’s memory and storage cost. However, when you re-process the same stored procedure, it captures all the data again, meaning it picks up all the files from the folder or path and tries to process them. This does not result in the insertion of duplicate records because the code is merged based on a key.

Thanks for taking the time to read this blog. Let me know if you have any questions or comments!

If you require a complete code, please send me a message and I will assist you. Let me know how I can be of help.

We insert and update data directly in the Snowflake table from the blob stage without using an intermediate stage.

Written by Amulya Kumar panda