Databricks announces $400M round on $6.2B valuation as analytics platform continues to grow last week
“Free” does not compete with $6.2B; however the Blockbuster killer Netflix is a $100B company (give or take $25B or so in any given forecast) who has been challenging Data Science since, well: they actually created the first legitimate challenge since $1M “Netflix Prize” 2009.
Also as of last week, October 23 2019, Polynote has been announced as Free and open source under Apache 2 licenses .
Now that I have your attention, allow me to digress. What I really want to share with you is Polynote,
Polynote is another Jupyter-like notebook interface with great promises to give a language agnostic Machine Learning interface.
It does seem that Netflix uses Jupyter, Databricks, and a virtually every other tool out there. This brings up some questions:
- Is Jupyter to Python for what Polynote is for Scala?
- Wasn’t the whole point of Jupyter to be for Polyglots?
- Do we really need another Notebook type?
- Are there some cool ideas in Polynote?
- What confuses me about Polynote?
- Does Polynote really threaten Databricks or was I just joking to get your attention?
As I attempt to answer these questions, I do want to give a hats off to what seems a very small team who made Polynote happen.
Is Jupyter to Python for what Polynote is for Scala?
The first big different is the JVM is the backend for Polynote. That is Java for those who aren’t so technical. While Jupyter who’s origin came from iPython evolved from the Python world to also support other languages, hence the name change from iPython notebooks to Jupyter Notebooks. Jupyter stands for: Julia, Python and R.
Wasn’t the whole point of Jupyter to be for Polyglots?
Jupyter tends to operate with a “Kernel” for each language and you can make your own. Yes someone made one for Scala. The whole idea of a notebook in Jupyter is it has one kernel which means one primary language per notebook.
Do we really need another Notebook type?
The difference with Polynote is that it does not have a primary language per notebook. The Kernel is one and it is over the JVM. Each cell has it’s own language and it can delegate the in memory variables (if simple) across each cell:
Just for completeness, you can break through to other kernals by using magic functions in Jupyter. I illustrated this in Python vs (and) R for Data Science
Are there some cool ideas in Polynote?
I do believe so:
- The ability to simply switch back and forth, cell to cell, is actually quite useful.
- The ability to see values/types on what is currently defined, also very useful.
- The ability to connect easily to a Spark Cluster to distribute computing — one of the big values to DataBricks, also very useful.
- Some pretty cool code complete:
What confuses me about Polynote?
No real interpolation between jupyter
Bad stack traces
Since the stack trace from Python is fed from JVM through JEB/JNI, it can be hard to get the real stack trace:
Don’t do this:
while True: pass
Sometimes cells forget themselves:
Other times the in and out get *Confused*
It wasn’t exactly hell, but Jupyter has put a lot of thought into the management of environments and kernels. Certainly they do not have it all perfect all the time. Similarly, DataBricks has allowed shipping off Eggs to Spark works for some time now: it comes with a price! There is no silver bullet when it comes to how to manage installs. I do wish I could conda install Polynote.
One note on the Jeb requirements, I had to install it in /jeb. While some of the configuration options are in a very obviously place, Polynote largely needs better documentation and some tutorials. I know, I know, I can write them myself! Oh the glory of open source~!
Getting a spark cluster up is always fun! [sic]
Does Polynote really threaten Databricks or was I just joking to get your attention?
Not yet. But with all open source it becomes what you want it to be. They need community, contributions, and independence. Recall what Sun did to Java. There is always this question in open source regarding how some things work, like Python’s PSF, a non profit that controls the language, clearly works. Jupyter is a non-profit too. Is the Netflix team cool enough to keep the intentions of open source. On the far other perspective, is Databricks and others cool enough to add that much value in the long run.
It’s not really fair to compare a just released open source notebook platform to DataBricks and Jupyter who have literally changed the way we do Data Science in recent years. Again, hats off to those who make Polynote. Clearly a lot of work is needed to make this production scale. Some good ideas here. I hope the Jupyter team hears this too so they can steal some of them, minus the bugs, of course.
For what it is worth, I am sticking with Jupyter on my day job for now.
 https://index.scala-lang.org/jupyter-scala/jupyter-scala/channels/0.1.0?target=_2.12 Almond: Scala for Jupyter