Snowflake Migration using StreamSets — dbaOnTap

Danny Bryant
Nov 1 · 4 min read

#OnTapToday, using @StreamSets to migrate data from your source database to Snowflake. In this post, I will walk through the process of configuring StreamSets to migrate data from my on prem DB to Snowflake.

WHY

You need to get some data from a source database to the Snowflake data warehouse. There were a few options at my disposal. I could have:

StreamSets operates on a Hybrid Open- Source Model (HOSS). StreamSets Data Collector is open source, released under the Apache 2.0 license. The remainder of the platform source code is proprietary and not available to customers

In this case, I chose StreamSets.

SETUP

The setup was actually quite complex on my Macbook as there were some hacks necessary on the OS to allow adjustments to the java heap size. Rumor has it, this is just an issue with Mac OS, and other linux OSes are fine. Fortunately, all this can be found with some google searches.

Here are the installation instructions:

  • Download, install, and run Data CollectorRemember on a Mac there are some additional OS configurations necessary to get around the heap size issue. I’ll try to locate the link that helped me out and update the post.
  • Launch the Data Collector, via the URL presented after the service starts and click on the Package Manager icon.

Since this a simple multi-table migration, we can build this pipeline by connecting a single Origin and Destinations. In this case, we will use the JDBC Multitable Consumer origin and the Snowflake Replicate destination.

As you can see, there are a number of tab associated with the two phases of this migration pipeline. Complete these with the appropriate values.

JDBC Multitable Consumer:

Snowflake

Leave all other tabs with any defaults

Now let’s switch over and set up the Snowflake data warehouse. If you do not have an account, you can visit this link to start a free trial. Once you have successfully logged in, you can use the commands below to setup your Snowflake environment:

We are now ready to run the StreamSets job. You will notice that we did not create any tables in the Snowflake data warehouse. One of the options we select in the StreamSets configuration was to automatically create the tables we are migrating.

Migrate

Back on the StreamSets Data Collector interface, click Start in the upper right.

The process will begin migrating the data from the source database, in this case, Oracle to Snowflake

I happen to know that there are 193 records across the tables that match the criteria (Table names start with DB_). Can you find where we did that??

After a few seconds, the tables have be created and the data has been migrated. Let’s check it out.

A review of the History tab in Snowflake shows the process StreamSets executes to:

  1. Create the required file format
  2. Create the table
  3. Use the put command to load the data into the internal stage
  4. Executes the copy command to populate the target table

In the Snowflake worksheet execute the following:

This should give you row counts for each table that was migrated.

Here is the same statement run from SQL Developer. All counts match, so the migration was successful.

This was a very simple example of using a StreamSets to migrate multiple tables from a source database to Snowflake.


Originally published at https://dbaontap.com on November 1, 2019.

Danny Bryant

Written by

Dad | Husband | SnowflakeDB | Oracle ACE Director | Speaker | SCUBA Diver | Docker Captain.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade