Review PolyNote — an IDE for Netflix Data Scientists and Machine Learning Researchers

Netflix open-sources Polynote: a new, polyglot notebook with first-class Scala support, Apache Spark integration, multi-language interoperability

Disclaimer: The views and opinions expressed in this article are those of the author’s and do not necessarily reflect the official policy or position of current or previous employer, organization, committee, other group or individual. Analysis performed within this article is based on limited dated open source information. Assumptions made within the analysis are not reflective of the position of any previous or current employer.

Requirements

  • Works with Linux and MacOS only. (Say goodbye to Windows)
  • Support for Spark & Python.

Table of Content

  • Run Polynote via Azure VM Linux.
  • Run Polynote via Docker Image.

Overview

In the past several days, the announcement of Netflix open-sourcing Polynote, an internal IDE with first-class Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, and SQL, as-you-type autocomplete, and more, has been the most shared topic in the data science and machine learning community. The IDE is believed to deliver “a great potential to address similar needs outside of Netflix.” (Open-sourcing Polynote: an IDE-inspired polyglot notebook, Netflix Technology Blog, 2019). It’s freely available as of today from Polynote.org and from GitHub. (Netflix open-sources Polynote to simplify data science and machine learning workflows, Kyle Wiggers, VentureBeat 2019).

The polyglot notebook with first-class Scala support, https://polynote.org/

Let us follow the installation guide provided in the Polynote website:

Polynote consists of a JVM-based server application, which serves a web-based client. To try it locally, find the latest release on the releases page.


Run Polynote via Azure VM Linux

Since I am a Windows user, I would like to create a Linux VM in the Azure Portal and try Polynote within there. You can follow the Microsoft tutorial or my Medium tutorial below:

I access my VM via Putty.


Run Polynote via Docker Image

The credit goes to greglinscheid who published the Unofficial Docker Image for Polynote, which helps create a repository with everything setup already. I will not go in specific details how you install docker in Windows/Mac/Linux, but you can refer to the installation guides in the following link:

If you already have the Docker installed in your local machine, you are ready. Let’s run the docker image that Greg Linscheid packaged.

Similar to the method above, my server is running at http://localhost:8192

Then, I played around a bit with the simple Python and Scala syntax:

Polynote also supports Vega charts. Here’s an example of the line chart:

{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 500,
"height": 200,
"padding": 5,

"signals": [
{
"name": "interpolate",
"value": "linear",
"bind": {
"input": "select",
"options": [
"basis",
"cardinal",
"catmull-rom",
"linear",
"monotone",
"natural",
"step",
"step-after",
"step-before"
]
}
}
],

"data": [
{
"name": "table",
"values": [
{"x": 0, "y": 28, "c":0}, {"x": 0, "y": 20, "c":1},
{"x": 1, "y": 43, "c":0}, {"x": 1, "y": 35, "c":1},
{"x": 2, "y": 81, "c":0}, {"x": 2, "y": 10, "c":1},
{"x": 3, "y": 19, "c":0}, {"x": 3, "y": 15, "c":1},
{"x": 4, "y": 52, "c":0}, {"x": 4, "y": 48, "c":1},
{"x": 5, "y": 24, "c":0}, {"x": 5, "y": 28, "c":1},
{"x": 6, "y": 87, "c":0}, {"x": 6, "y": 66, "c":1},
{"x": 7, "y": 17, "c":0}, {"x": 7, "y": 27, "c":1},
{"x": 8, "y": 68, "c":0}, {"x": 8, "y": 16, "c":1},
{"x": 9, "y": 49, "c":0}, {"x": 9, "y": 25, "c":1}
]
}
],

"scales": [
{
"name": "x",
"type": "point",
"range": "width",
"domain": {"data": "table", "field": "x"}
},
{
"name": "y",
"type": "linear",
"range": "height",
"nice": true,
"zero": true,
"domain": {"data": "table", "field": "y"}
},
{
"name": "color",
"type": "ordinal",
"range": "category",
"domain": {"data": "table", "field": "c"}
}
],

"axes": [
{"orient": "bottom", "scale": "x"},
{"orient": "left", "scale": "y"}
],

"marks": [
{
"type": "group",
"from": {
"facet": {
"name": "series",
"data": "table",
"groupby": "c"
}
},
"marks": [
{
"type": "line",
"from": {"data": "series"},
"encode": {
"enter": {
"x": {"scale": "x", "field": "x"},
"y": {"scale": "y", "field": "y"},
"stroke": {"scale": "color", "field": "c"},
"strokeWidth": {"value": 2}
},
"update": {
"interpolate": {"signal": "interpolate"},
"fillOpacity": {"value": 1}
},
"hover": {
"fillOpacity": {"value": 0.5}
}
}
}
]
}
]
}

Since the IDE is supporting only Linux and MacOS, the settings are such unfamiliar to the Windows users. The problem with SQL still persists and requires further configuration. I love the fact that they can support Scala and Python, with the powerful visualization package like Vega. This is ideal for those who want to get their hands dirty with Scala (with no additional cost).

Kyle’s Rating: ✩✩✩✩ (4/5)


References

Korkrid Kyle Akepanidtaworn

Written by

Cloud Solution Architect (Data & AI) at Microsoft, Former Data Scientist at Accenture Applied Intelligence