New in Snowflake: Java UDFs (with a Kotlin NLP example)

Snowflake now let’s you easily create Java UDFs, which is an incredibly powerful and versatile feature. Let’s check it out with by running a library written in Kotlin — to detect written languages. Out of GitHub and into your SQL code, in 3 easy steps.

Detect written languages with a Java UDF

Quick example: Detect written languages

Snowflake has made it really easy to create Java UDFs. You just need to do something like this:

Then you can use that function in your SQL statements:

That’s easy — and then the real power comes with the ability to load and use jar packages. For example, to detect written languages using Lingua:

If you have the right jar staged into your Snowflake account, that’s all it takes:

Notes

  • I love the ability to write custom Java code while defining UDFs within SQL (Snowflake takes care of compiling it). This allows you to handle all your glue code in one place, within your SQL scripts and dbt projects.

Setup

Lingua is “The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike” (according to them). It’s written in Kotlin, which is a language that runs on the JVM.

To package this library into a single fat jar for Snowflake, I followed these steps:

  1. Clone the git project
  2. Build a jar with dependencies with Gradle.

Then you can PUT this jar into your Snowflake account with SnowSQL:

Read more

Acknowledgements

Java UDF support is in active development by a great team at Snowflake, including Elliott Brossard and Isaac Kunen. Stay tuned for more!

Thanks to Peter M. Stahl for the awesome Lingua library:

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn, and check reddit.com/r/snowflake for the most interesting Snowflake news.

--

--

Felipe Hoffa
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.