Developing Snowflake UDFs with JVM Languages using Gradle

Use Scala, Kotlin, or Groovy as if they were Java.

From coffee to JVM languages, the world is full of choices.

If you are reading this, then you probably read my previous articles Automate the Deployment of Snowflake UDFs with Gradle and Developing Snowflake UDFs with IntelliJ IDEA or VS Code.

In the first article above, I mention that I would have preferred to author my UDFs using Scala instead of Java, but that currently isn’t supported in Snowflake. I need to elaborate on this point. We can create a UDF with Java with the below snippet:

CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
as
$$
public class Sample {
public String addNum(int num1, int num2) {
try {
int sum = num1 + num2;
return ("Sum is: " + sum);
} catch (Exception e) {
return null;
}
}
}
$$;
SELECT add_numbers(1,2);

+------------------+
| ADD_NUMBERS(1,2) |
+------------------+
| Sum is: 3 |
+------------------+

But a comparable approach with Scala doesn’t work:

CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language SCALA
RUNTIME_VERSION = '2.12'
handler = 'Sample.addNum'
as
$$
class Sample {
def addNum(num1: Integer, num2: Integer): String = {
try {
"The sum is: " + (num1 + num2).toString()
} catch {
case e: Exception => return null
}
}
}
$$;
Unknown function language: SCALA.

We can write stored procedures using Scala in this way, but not functions. Also discussed in the previous article, we can write inline Scala UDFs using a Snowpark session, but we can’t author those UDFs in a project module in our favorite IDE paired with unit tests. Unless the function is trivial in nature, we must ship it with unit tests, and those tests need to be easily accessible in an IDE or CLI and run with each and every CI/CD build.

What if I told you…

Gradle can easily build Java-compliant JAR files from Scala, Kotlin, or Groovy. These libraries are usable by any Java consumer as if written in Java: a JVM on your local machine, a Java application server, a cloud function, and yes… a Snowflake UDF.

What if I told you this might be the most misquoted line in movie history?

I’ve included Scala, Kotlin, and Groovy examples in the repository for the Gradle plugin for Snowflake which I discuss at length in the first article above, but we’ll focus on the Scala example for this article. We can clone the repo and access the Scala example:

gh repo clone stewartbryson/gradle-snowflake
cd gradle-snowflake/examples/scala-jar

We can view the Scala class file in src/main/scala, but it’s shown here as well:

class Sample {
def addNum(num1: Integer, num2: Integer): String = {
try {
"The sum is: " + (num1 + num2).toString()
} catch {
case e: Exception => return null
}
}
}

I’m using the Gradle CLI in this example for clarity, but we could easily use either IntelliJ or VS Code, as explained in the second article referenced above. With our Scala project configured, all we have to do is execute snowflakeJvm and the plugin will build a JAR library, upload it to Snowflake, and create the function:

❯ ./gradlew snowflakeJvm

> Task :compileScala

> Task :snowflakeJvm
File scala-jar-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
imports = ('@upload/libs/scala-jar-0.1.0-all.jar')


BUILD SUCCESSFUL in 15s
3 actionable tasks: 3 executed

Notice that we aren’t trying to use Snowflake’s support for Scala, which would have required LANGUAGE SCALA and isn’t supported for functions. Instead, we use Gradle’s ability to compile Scala into a Java-compatible library. Now let’s test to see that it works:

SELECT add_numbers(1,2);

+------------------+
| ADD_NUMBERS(1,2) |
+------------------+
| The sum is: 3 |
+------------------+

Post-Credits Scene: Ephemeral Testing

I considered writing a separate article to discuss the new ephemeral testing functionality in the plugin but decided to add it as a stinger instead, MCU-style.

We waited through 10 minutes of credits to see… shawarma.

Running unit tests using static Snowflake databases is boring, especially considering the zero-copy cloning functionality available. The snowflakeJvm task supports cloning an ephemeral database from the database we connect to and publishing to the clone instead. This workflow is useful for CI/CD processes testing pull requests and is accessible either through the snowflake configuration closure in the build.gradle file, or as an option passed directly to the Gradle task:

snowflake {
// additional configurations
useEphemeral = true
}

or…

❯ ./gradlew snowflakeJvm --use-ephemeral

> Task :snowflakeJvm
Ephemeral clone ephemeral_IE6F8nZpu created.
File scala-jar-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
imports = ('@upload/libs/scala-jar-0.1.0-all.jar')

Ephemeral clone ephemeral_IE6F8nZpu dropped.

BUILD SUCCESSFUL in 49s
3 actionable tasks: 1 executed, 2 up-to-date

The plugin is aware when it is running in CI/CD environments via the CI Detect Gradle plugin, and currently supports the following:

When the CI/CD environment is detected, the plugin will name the ephemeral database clone based on the pull request number, the branch name, or the tag name instead of the auto-generated name shown above:

❯ ./gradlew snowflakeJvm --use-ephemeral

> Task :snowflakeJvm
Ephemeral clone ephemeral_pr_46 created.
File scala-jar-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
imports = ('@upload/libs/scala-jar-0.1.0-all.jar')

Ephemeral clone ephemeral_pr_46 dropped.

BUILD SUCCESSFUL in 47s
3 actionable tasks: 1 executed, 2 up-to-date

If we prefer to specify a clone name instead of relying on the plugin to generate one, that is supported as well:

snowflake {
// additional configurations
useEphemeral = true
ephemeralName = 'testing_db'
}

or…

❯ ./gradlew snowflakeJvm --use-ephemeral --ephemeral-name testing_db

> Task :snowflakeJvm
Ephemeral clone testing_db created.
File scala-jar-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
imports = ('@upload/libs/scala-jar-0.1.0-all.jar')

Ephemeral clone testing_db dropped.

BUILD SUCCESSFUL in 48s
3 actionable tasks: 1 executed, 2 up-to-date

Finally, if we want to keep the ephemeral database after the build is complete, we simply pass the --keep-ephemeral option and it won't be dropped. This is useful for manual prototyping to ensure our applications are being deployed successfully, but shouldn't be used for automated CI/CD workflows unless you want the clone to be created when the pull request is opened and dropped when it is closed:

❯ ./gradlew snowflakeJvm --use-ephemeral --keep-ephemeral

> Task :snowflakeJvm
Ephemeral clone ephemeral_IppUJjVQI created.
File scala-jar-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
imports = ('@upload/libs/scala-jar-0.1.0-all.jar')


BUILD SUCCESSFUL in 45s
3 actionable tasks: 1 executed, 2 up-to-date
SELECT ephemeral_IppUJjVQI.developer.add_numbers(1,2);

+------------------------------------------------+
| EPHEMERAL_IPPUJJVQI.DEVELOPER.ADD_NUMBERS(1,2) |
+------------------------------------------------+
| The sum is: 3 |
+------------------------------------------------+

--

--

Stewart Bryson
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Snowflake Data Superhero | Oracle ACE Alum | Writer, speaker, podcast guest | Amateur cyclist | Professional philosopher