Snowpark — The Databricks Killer?
Elastic Python-compute for Machine Learning is coming to Snowflake — it may be a deadly snowstorm for Spark…
I often help design Data Platforms for large IT shops (my other secret job that pays me better). What I encounter a lot is Redshift and Snowflake in the warehouse (more of the latter these days) supplemented by Databricks for Machine Learning (ML) and Spark support.
Databricks has done 2 seemingly simple things well
- Managing Spark clusters perfectly. It is surprising how limited AWS EMR is and how difficult it is to maintain a Kubernetes, Mesos, or Yarn cluster on your own (you don’t want to do it, esp not in production).
- Making access for Python seamless to submit jobs to the cluster without a whole bunch of configuration, issues, and setup headaches (see issue #1).
To be fair they (mix of credit to Databricks and Apache Foundation) have also done a lot of other great things:
- SparkSQL to make life easier for SQL oriented folks
- PySpark to integrate better with Data Science workflows (and remove the need to learn Scala — fun but tricky)
- Delta Lake to enable ACID transactions and data versioning