Snowpark — The Databricks Killer?

Elastic Python-compute for Machine Learning is coming to Snowflake — it may be a deadly snowstorm for Spark…

Doug Foo
CodeX

--

An Avalance brewing in Data Platforms — Photo by Greg Rosenke on Unsplash

I often help design Data Platforms for large IT shops (my other secret job that pays me better). What I encounter a lot is Redshift and Snowflake in the warehouse (more of the latter these days) supplemented by Databricks for Machine Learning (ML) and Spark support.

Databricks has done 2 seemingly simple things well

  1. Managing Spark clusters perfectly. It is surprising how limited AWS EMR is and how difficult it is to maintain a Kubernetes, Mesos, or Yarn cluster on your own (you don’t want to do it, esp not in production).
  2. Making access for Python seamless to submit jobs to the cluster without a whole bunch of configuration, issues, and setup headaches (see issue #1).

To be fair they (mix of credit to Databricks and Apache Foundation) have also done a lot of other great things:

  • SparkSQL to make life easier for SQL oriented folks
  • PySpark to integrate better with Data Science workflows (and remove the need to learn Scala — fun but tricky)
  • Delta Lake to enable ACID transactions and data versioning

--

--