How to move data between your Snowflake environments in under 5 minutes

Published in

The Qonto Way

6 min readJul 20, 2021

Production can be a rough place, especially when dealing with heavy migrations of data pipelines. Testing pipelines, in different environments and with actual data can save you lots of trouble down the road. In this article, we’ll show you a quick solution that we implemented at Qonto to copy data between Snowflake environments for testing purposes.

At Qonto, we are building our brand new data stack by migrating all existing data pipelines to Airflow. Our goal is to have a highly dynamic and easily configurable data platform hence allowing other tech teams to add new data sources in our Snowflake warehouse and to get an always up-to-date data lineage from ingestion to activation (more on that in an upcoming article — stay tuned).

To achieve this migration with confidence, we choose to create multiple environments with dedicated Snowflake accounts. So in addition to the existing production environment, we now have continuous integration (CI) ones for automated testing and staging ones for manual testing.

Working on data pipelines in different and hermetic environments is more secure because it allows experimentation and greatly reduces the risk of production problems later down the road. However, because these environments are hermetic, it also means that data from one environment is not accessible in another environment. And while experimenting with your data pipelines, you might need to access and copy data from different test environments.

This situation is where Snowflake user stages come in handy. We’ll see in this article how we used them to drastically reduce the time needed to move data from one environment to another.

Did you know about internal stages?

Stages are Snowflake objects used to load data into or from Snowflake. There are 2 types of stages for different purposes. We often hear of Snowflake stages as references to cloud storage buckets (Google Cloud Storage, Amazon S3, …). Those are external stages and they work like a symlink between a Snowflake folder and a bucket.

But Snowflake also offers ready-to-use stages known as internal stages. If you have a Snowflake account, you already have an internal stage. Let’s see the difference between those two objects and how you can use them for different purposes.

External stages: elegant but time-consuming

To work with those external stages, you’ll need to configure a cloud storage bucket and create a Snowflake external stage referencing this bucket. This step is straightforward but might take you a bit of time (you can find detailed examples here).

First, you’ll need to find the correct Snowflake configuration depending on your use case: like most Snowflake objects, a stage is associated with a ROLE and created in a SCHEMA inside a DATABASE. You’ll then probably need to liaise with your SRE team to create the appropriate cloud storage bucket (which might require a Terraform merge request).

Those stages are ideal for automating a specific use case. But if you’re looking for a quick and dirty solution to copy data from one test environment to another, internal stages can be a better choice.

Internal stages: quick and dirty

Unlike external stages, internal stages are only available within Snowflake and do not reference any external cloud storage bucket.

And as Snowflake documentation states, if you have a Snowflake account, you already have a personal stage that you can use to quickly move data around. You can run the following command in Snowflake UI to list the files already present in your personal stage:

Stages are named and you would list their content by running the following command:

By using your already configured personal stage, you greatly reduce the overheads of creating and configuring cloud buckets and external stages.

The 5 steps copy workflow

Here is the plan we’ll follow to copy a table from one environment to the other, using a personal stage:

As you can see, we have :

1 step on your LOCAL computer
2 steps on the SOURCE environment
2 steps on the DESTINATION environment

Almost all of those steps can be done using the Snowflake UI. But to perform the GET files and PUT files operations, you’ll need snowsql. Snowsql is a command-line tool that will allow us to execute Snowflake instructions with the terminal.

Let’s detail a bit more the 5 steps:

Locally

Create a local folder in which we’ll download the data files:

On the source account

Connect with snowsql on the source account.

Export the content of MY_DATABASE.MY_SCHEMA.MY_TABLE in our personal stage (@~) in a “my_folder” folder (automatically created). In this folder, single or multiple files prefixed with “my_table” (depending on the size of the table) will be exported.

Get the exported files in our local folder.

Copy the result of the following query (GET_DDL) on the source table (we’ll pass it in the following step):

We’ve downloaded the files locally. We are now ready to upload those files to our destination.

On the destination account

Connect with snowsql on the destination account.

Create the table on the destination account by pasting the DDL (copied in the last step).

Upload the local data files in the destination user stage.

Load the staged files in the destination table.

The destination table is now created and populated. You can start using it in your data pipeline.

Automating those 5 steps

We detailed the 5 steps process to get a better understanding of what need to be done in order to copy data from one Snowflake environment to another. Even if those steps are fairly easy, it can be cumbersome to run all those steps each time you need to copy a table from a source to a destination. Below, we show a way to automate the process using a shell script with 2 SQL queries.

Prerequisite

First, you need to have snowsql installed. See here for more information. You’ll also need to configure some automated connections to snowsql. Let’s see how we can do this.

When installing snowsql, you’ll have a config file at ~/.snowsql/config where you can configure connections. You can add your credentials by adding and adapting the following lines:

Doing so, you can connect to snowsql using the following command:

The code

The code is composed of 3 files:

an export_data_query.sql file to export the source table data and its DDL
an import_data_query.sql file for creating the destination table and loads the data in it
a copy_table.sh file for the shell code

Note that this script shows a way of automating. You might need to adapt it depending on your need (for instance the FILE_FORMAT that we choose in the example might not be the correct one for you).

Before running the shell script, make the file executable by running the following command:

You can then run the script by launching one of the following commands:

Conclusion

Testing data pipelines with actual data and having hermetic testing environments can make a huge difference when delivering code in production. When we started to work on our different testing environments, it seemed difficult to have access to data and we were not sure how to achieve this quickly.

With internal Snowflake stages, we can now experiment very quickly with actual data by copying tables between testing environments or even load hand-made CSV files in testing tables. The best part is that we didn’t need to configure anything: those stages are already there!

About Qonto Qonto is a finance solution designed for SMEs and freelancers founded in 2016 by Steve Anavi and Alexandre Prot. Since our launch in July 2017, Qonto has made business financing easy for more than 150,000 companies.

Business owners save time thanks to Qonto’s streamlined account set-up, an intuitive day-to-day user experience with unlimited transaction history, accounting exports, and a practical expense management feature.

They have had more control, whilst being able to give their teams autonomy via real-time notifications and a user-rights management system.

They have improved visibility on cash-flows through tools such as smart dashboards, transaction auto-tagging, and cash-flow monitoring.

They also enjoy stellar customer support at a fair and transparent price.

Interested in joining a challenging and game-changing company? Consult our job offers!