Get started with Jupyter Notebook and Watson Studio

The most innovative ideas are often so simple that only a few stubborn visionaries can conceive of them. Ward Cunningham and his fantastic Wiki-concept that became the Wikipedia comes to mind when one first comes in contact with the Jupyter Notebook. Notebook, yes we get that, but what exactly is a Jupyter Notebook and what is it that makes it so innovative?

To quote: “The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.”

And they can be easily shared with others using email, Dropbox, GitHub and other sharing products. You can even share it via Twitter!

And if that is not enough, one can connect a notebook to Big Data tools, like Apache Spark, scikit-learn, ggplot2, TensorFlow and Caffe!

There is a certain resemblance to Node-Red in functionality, at least to my mind. And don’t forget, you can even install the Jupyter Notebook on the Raspberry Pi!

You can run Jupyter Notebooks on localhost but for collaboration you want to run it in the cloud. And this is where he IBM Cloud comes into the picture.

The vehicle for running Jupyter Notebook in the IBM Cloud is Watson Studio, an all-purpose development tool for all your Data Science, Machine Learning and Deep learning needs. Whatever data science or AI project you want to work on in the IBM Cloud, the starting point is always the Watson Studio.

In the Watson Studio you select what area you are interested in, in our case

we want to create a new Jupyter Notebook, so we click on New notebook at the far left.

We then get a number of options. We can enter a blank notebook, or import a notebook from a file, or, and this is cool, from a URL.

So let’s do that: Hello notebook and we notice the filetype jpynb

We click on Create Notebook at the bottom right of the page which will give us our own copy of the Hello World notebook we copied, or else, if we chose to start blank, a blank notebook.

And if we copy the Hello World notebook we can start to change it immediately in the Watson Studio environment, as we have done above. And then save it to our own GitHub repository.

It is also important to note that the IBM Cloud executes the Jupyter Notebook-environment in Apache Spark, the famous open source cluster computing framework from Berkeley, optimized for extremely fast and large scale data processing. And talking of the Jupyter Notebook architecture in the IBM Cloud, you can connect Object Storage to Apache Spark. So we can run our Jupyter Notebook like a bat out of hell as the saying goes.

This is a high-performance architecture at its very best. And thanx to the integration with GitHub, collaboration in developing notebooks is easy.

Below is a good introduction to creating a project for Jupyter Notebooks and running Spark jobs, all through Watson Studio.

But this is just the beginning. Watson Studio is the entry point not just to Jupyter Notebooks but also to Machine and Deep Learning, either through Jupyter Notebooks or directly to ML or DL.

A very cool and important environment that I hope to spend considerable time exploring in the next few weeks.