Ingest External Data Into Snowflake with SnowPark and JDBC

Mauricio Rojas
5 min readOct 8, 2023

--

With the introduction of external network access capabilities, Snowflake has taken data ingestion to a new level, making the once-challenging task more accessible and efficient than ever before.

In this blog post we will explore how we can leverage this feature to easily ingest data from external databases into Snowflake.

JDBC, a widely embraced standard application programming interface (API), provides a means for software applications to connect with and interact with various database management systems.

With a multitude of JDBC drivers readily available, it’s only logical to harness their capabilities within the Snowflake ecosystem.

As you will see not much is needed to enable this access.

Getting Started

First for external access we need to configure it.

In this blog post I would use an Azure SQL Server Database, but this approach will work with other databases.

Lets assume that my database is at an url like sqlserver-tests.database.windows.net.

For sql server databases we also need the 1433 port.

With this settings we can create the network rule.

CREATE OR REPLACE NETWORK RULE external_database_network_rule
TYPE = HOST_PORT
VALUE_LIST = ('sqlserver-tests.database.windows.net:1433')
MODE= EGRESS
;

For your database connection you will typically need an user/password. We can leverage Snowflake Secrets to storing sensitive information, or pass the user/password as part of the options.

CREATE OR REPLACE SECRET external_database_cred
TYPE = password
USERNAME = 'serveradmin'
PASSWORD = 'xxxxxxxxx';

🥁 Almost ready.

Now we need to create an integration that we will associate with our UDF/procs to allow them to leverage the external access.

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION external_database_network_rule_ext_int
ALLOWED_NETWORK_RULES = (external_database_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (external_database_cred)
ENABLED = true;

Excellent. All that is left now is to create a function.

For this use case a Java Tabular function will be very useful.This function will just leverage very standard JDBC apis, the steps are:

  • load the JDBC driver
  • create a JDBC connection
  • create an statement
  • execute the query
  • get the metadata
  • retrieve and return the results
  • Snowflake unstructured support provides a flexible way to stream the results

CREATE OR REPLACE FUNCTION READ_JDBC(OPTION OBJECT, query STRING)
RETURNS TABLE(data OBJECT)
LANGUAGE JAVA
RUNTIME_VERSION = '11'
IMPORTS = ('@mystage/mssql-jdbc-12.4.1.jre11.jar')
EXTERNAL_ACCESS_INTEGRATIONS = (external_database_network_rule_ext_int)
SECRETS = ('cred' = external_database_cred )
HANDLER = 'JdbcDataReader'
AS $$
import java.sql.*;
import java.util.*;
import java.util.stream.Stream;
import com.snowflake.snowpark_java.types.SnowflakeSecrets;

public class JdbcDataReader {

public static class OutputRow {
public Map<String, String> data;

public OutputRow(Map<String, String> data) {
this.data = data;
}
}

public static Class getOutputClass() {
return OutputRow.class;
}

public Stream<OutputRow> process(Map<String, String> jdbcConfig, String query) {
String jdbcUrl = jdbcConfig.get("url");
String username;
String password;

if ("true".equals(jdbcConfig.get("use_secrets")))
{
SnowflakeSecrets sfSecrets = SnowflakeSecrets.newInstance();
var secret = sfSecrets.getUsernamePassword("cred");
username = secret.getUsername();
password = secret.getPassword();
}
else
{
username = jdbcConfig.get("username");
password = jdbcConfig.get("password");
}
try {
// Load the JDBC driver
Class.forName(jdbcConfig.get("driver"));
// Create a connection to the database
Connection connection = DriverManager.getConnection(jdbcUrl, username, password);
// Create a statement for executing SQL queries
Statement statement = connection.createStatement();
// Execute the query
ResultSet resultSet = statement.executeQuery(query);
// Get metadata about the result set
ResultSetMetaData metaData = resultSet.getMetaData();
// Create a list of column names
List<String> columnNames = new ArrayList<>();
int columnCount = metaData.getColumnCount();
for (int i = 1; i <= columnCount; i++) {
columnNames.add(metaData.getColumnName(i));
}
// Convert the ResultSet to a Stream of OutputRow objects
Stream<OutputRow> resultStream = Stream.generate(() -> {
try {
if (resultSet.next()) {
Map<String, String> rowMap = new HashMap<>();
for (String columnName : columnNames) {
String columnValue = resultSet.getString(columnName);
rowMap.put(columnName, columnValue);
}
return new OutputRow(rowMap);
} else {
// Close resources
resultSet.close();
statement.close();
connection.close();
return null;
}
} catch (SQLException e) {
e.printStackTrace();
return null;
}
}).takeWhile(Objects::nonNull);
return resultStream;
} catch (Exception e) {
e.printStackTrace();
Map<String, String> rowMap = new HashMap<>();
rowMap.put("ERROR",e.toString());
return Stream.of(new OutputRow(rowMap));
}
}
}
$$;

Reading JDBC Data with SQL

Now with just SQL we can connect to an external database and read the results.

Another thing that I love about it is that this approach can also be use on any other technology stacks.

We just saw SQL, but we can use it in Scala, Java and Python as well.

Reading JDBC data with Scala

For example the following Scala code can be used on an Snowflake Scala Stored Procedure

Reading JDBC data with Java

The following Java code can be used on an Snowflake Java Stored Procedure:

Java Worksheet using our function

Reading JDBC data with Python

The following code can be used in a Snowpark Python worksheet:

Python Worksheet using our function

And what about Metadata 🏷️

After some tests, I also found useful to develop another function to also get the Metadata.

Getting metadata can be useful for example to build views or dynamic tables.

I won't explain all the details and just paste the code.

When we run this funcion with a query like the following, we can see the description of the columns that will be returned by the query:

Using a function the get metadata from a query

And this metadata can be used for many things more. The following queries show how we can use them to get results in a more standard table form.

-- query to build a fragment of a query you can use as shown below
SELECT
'SELECT ' ||
ARRAY_TO_STRING(array_agg(EXPRESSION || ' ' || COLUMN_NAME) ,',') ||
' FROM ' QUERY_HEADER
FROM TABLE(READ_JDBC_METADATA(
OBJECT_CONSTRUCT(
'driver','com.microsoft.sqlserver.jdbc.SQLServerDriver',
'url','jdbc:sqlserver://sqlserver-tests.database.windows.net:1433;database=my-database',
'use_secrets','true'),
'SELECT AddressID, AddressLine1, City, StateProvince ModifiedDate FROM [my-database].SalesLT.Address'));

-- copy the result from the previous query

SELECT $1:AddressID::NUMBER AddressID,$1:AddressLine1::STRING AddressLine1,$1:City::STRING City,$1:ModifiedDate::STRING ModifiedDate FROM
(SELECT data "$1" FROM TABLE(READ_JDBC(
OBJECT_CONSTRUCT(
'driver','com.microsoft.sqlserver.jdbc.SQLServerDriver',
'url','jdbc:sqlserver://sqlserver-tests.database.windows.net:1433;database=my-database',
'use_secrets','true'),
'SELECT AddressID, AddressLine1, City, StateProvince ModifiedDate FROM [my-database].SalesLT.Address')));

Final Thoughts

Well, I got a little crazy with this, but I am really enjoying the fact that I can now do this so easily now.

Remember that this approach for now only works fine for public IP/URLs. I have not yet tested this yet with proxies like ngrok (which might be good for tests/development).

I have shared almost all my code samples and I hope they are as useful for you as they have been for me.

--

--

Mauricio Rojas

Engineer with a passion for software design & AI. Expert in Java, C++, & big data. MSc in Computer Science. Currently exploring SnowPark.