Data Cloud Summit Takeaways

Matt Weingarten
3 min readJun 7, 2024

--

Snow (Hey Oh)

Introduction

Conference season has officially kicked off with the Data Cloud Summit, hosted by Snowflake. While I was hoping to catch most of the breakout sessions online, turns out only the keynotes were streamed (I guess the in-person conference ways of the past are back again). Either way, here were my top takeaways from all of the announcements of the week.

Polaris Catalog

Polaris is Snowflake’s open-source catalog for all things Apache Iceberg. This allows for cross-engine read/write operability, something that data teams always appreciate in the changing landscape of tools. It also has proper governance protocols through Snowflake Horizon, making sure principles such as column masking are in place.

How different is Polaris from Unity Catalog? The big difference is the fact that Polaris is open-source, while Unity is still an abstraction to those not working on it at Databricks. The closest comparison for rivals to Polaris would be other open-source options like Amundsen or DataHub, which don’t necessarily serve entirely the same purpose but are similar in nature. Being built on Iceberg is also a huge plus.

Snowflake Pandas

I’ve been a big fan of Snowpark, Snowflake’s libraries that run like Spark (but actually aren’t), ever since starting to play around with it a few months ago. You often see a lot of jumping back and forth between Snowpark and pandas because of additional operations you want to do on top of DataFrames, but that no longer needs to be the case with the announcement that Snowpark is adding a Pandas-esque API.

Now, none of your processing needs to actually leave Snowflake to do the operations you wanted to perform. This means that the Databricks notebooks we set up to utilize Pandas and Snowpark can go straight into Snowflake (for the most part), further simplifying the duplication of compute we have in places.

Snowflake Notebooks

Perhaps one of the biggest missing elements from Snowflake has been its lack of a notebook environment. It seems Snowflake has finally recognized that, with the announcement that notebooks are going live. Similar in nature to what we’ve come to love in Databricks, Snowflake notebooks can be integrated with Git and scheduled to run as jobs, which will make Snowflake a much friendlier platform for data teams to utilize.

I’ll definitely be interested to try out notebooks when they come our way. It might take some catchup time to match the standard we’ve come to expect from Databricks, but this is a huge step in the right direction of letting us process things where they should belong.

Conclusion

Alright, I’m not going to gloss over what may have been the biggest announcement of the week, which was Databricks acquiring Tabular, the company created by the founders of Iceberg (announcing this during Data Cloud Summit is diabolical/brilliant, by the way). Despite Delta Lake being the standard in Databricks, they were smart enough to see the need for making Iceberg a first-class citizen within the platform as well as their UniForm offering. This will be an interesting development to watch over the next few months.

Sure enough, conference season continues with the Data & AI Summit next week. We’ll see if Snowflake has any bombs left over to rain on the Databricks parade. Until next time.

--

--

Matt Weingarten

Currently a Data Engineer at Disney Streaming Services. Previously at Meta and Nielsen. Bridge player and sports fan. Thoughts are my own.