KNIME ❤️ Snowflake

The competition in the cloud data warehouse market is certainly hotting up! Amazon continues to be the dominant force with AWS Redshift but Snowflake is hot on their heels.

With Snowflake’s innovative approach to scaling, performance and cost, we’ve started to move all of our KNIME cloud data warehouse workflows from Redshift to Snowflake and with KNIME’s database nodes we can automatically create and launch Snowflake clusters, databases and configure data sources.

Luckily just like with AWS Redshift, the Snowflake drivers are just as easy to install and configure, find out how below;


Configuring the KNIME JDBC Driver

  1. Download the JDBC driver from Snowflake website via Maven and save this in a permanent directory that you’re not going to delete. (http://search.maven.org/) You’ll want the jar file, not the javadoc.jar.

2. Open KNIME Analytics Platform and navigate to your preferences. I’m using an Apple Mac so your screen may vary slightly.

3. Expand the KNIME section and select Databases. Then select Add file and select the .jar file you downloaded in Step 1.

KNIME JDBC Driver Interface

4. Once you’ve hit Apply and Close, validate that the JDBC Driver is correctly installed by adding a KNIME Database Connector node to a KNIME workflow.

Database Connector Node

5. Configure the Database Connector node by selecting the newly installed com.snowflake.client.jdbc.SnowflakeDriver Database driver in the dropdown.

6. Replace the Database URL with your Snowflake instance details;

jdbc:snowflake://<account_name>.<region_id>.snowflakecomputing.com/?<connection_params>

You can add the name of the warehouse or database to this connection string, for example;

jdbc:snowflake://myaccount.eastregion.snowflakecomputing.com/?warehouse=EXAMPLEWAREHOUSE&database=EXAMPLEDB

One thing I would recommend you add to your connection URL is;

CLIENT_SESSION_KEEP_ALIVE=true

jdbc:snowflake://myaccount.eastregion.snowflakecomputing.com/?warehouse=EXAMPLEWAREHOUSE&database=EXAMPLEDB& CLIENT_SESSION_KEEP_ALIVE=true

This prevent nodes further down the workflow from timing out and prevents any token/authentication errors.

7. Finally enter your username and password, hit Apply and run the node. If successful, you’ll get a green indicator light.


Dynamically Creating / Scaling Snowflake Instances

KNIME Workflow to create Snowflake objects

One of the great benefits of KNIME Analytics Platform is being able to create a workflow that dynamically adjusts your cloud data-warehouse to optimise for ETL/ELT or Data Science processes. By leveraging the Database SQL Executor node, we can create a workflow that creates and launches a Snowflake warehouse and database before uploading and processing data.

Following Snowflakes example (https://docs.snowflake.net/manuals/user-guide/getting-started-tutorial.html)

Create database

create or replace database sf_tuts;

Create warehouse

create or replace warehouse sf_tuts_wh with
warehouse_size=’xsmall’
auto_suspend = 180
auto_resume = true
initially_suspended=true;

Create CSV Format

create or replace file format my_csv_format
type = csv field_delimiter = ‘|’ null_if = (‘NULL’, ‘null’) empty_field_as_null = true compression = gzip;

When you run the workflow, it’ll automagically create your warehouse, database and source. You can chain together as many initiation SQL statements as required. To drop it all at the end, just add a node with;

drop database if exists sf_tuts;
drop warehouse if exists sf_tuts_wh;

Automate your Data Warehouse

A complete KNIME workflow that takes Excel and JSON data, uploads it to S3 and automatically creates tables in Snowflake based on source data metadata.

Once you have everything in place, you can use the KNIME Database Table Creator and Dynamic Settings to create tables based on data supplied. This means, as new columns are added to sources, your data warehouse tables can automatically be rebuilt dynamically with the correct column types and sizes.


If you want to find out more about Forest Grove Technology and our BI as a Service offerings, click the link.

https://www.forestgt.com.au/biservice/

You can find more information about the Snowflake JDBC drivers here https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html

Snowflake has great SQL support for manipulating Snowflake objects https://docs.snowflake.net/manuals/sql-reference/sql-all.html