Deploy Your Analytics as Web Apps Using PixieDust’s 1.1 Release

From data science, directly to the web

--

Question: What? I can deploy the analytics I built in my Jupyter Notebook as a web application?

Answer: Yes, this is not a typo! This latest innovation from the PixieDust team lets you deploy a PixieApp into a web application with a simple click (no added code).

Note: This feature is still in alpha and therefore not ready to be used in production. For more on the PixieGateway release alpha, see the release notes.

Note: Because of an error in the build files, we had to issue an emergency patch. The correct version for this release is now 1.1.0.1.

End-to-End analytics pipeline

Before publishing your first PixieApp to the web, let’s quickly recap the evolution of the PixieDust helper library for Jupyter Notebooks.

  • It started by providing two simple APIs: one to load data so you don’t have to learn the intricacies of Spark APIs, and one for auto-visualizing a data set, whether it’s a Spark/Pandas DataFrame or a GraphFrame.
  • Then came PixieApps: they let you build embedded application UIs within notebooks that leverage your analytics and charts with minimal coding. (See this nice example by Jordan George.)

Naturally, the next step was to free PixieApps from the limitation of running only within notebooks and to run them as regular web apps. The application lifecycle now looks like this:

With PixieApp’s ability to publish charts from notebook code as web apps, data science notebooks now reach developers and line-of-business users.

Here’s the collaboration workflow:

  1. Data scientists use a Python notebook to load, enrich, analyze data, and create analytics (dashboards, reports, machine learning models, etc.).
  2. From the same notebook, developers create a PixieApp to operationalize these analytics. Developers can also collaborate on analytics with JavaScript using PixieDust Nodebooks.
  3. Once ready, developers publish the PixieApp as a web application, where it can be easily consumed interactively by line-of-business users.

How does it work?

The PixieGateway web application server is responsible for loading and running PixieApps. It is built on top of the Jupyter Kernel Gateway and therefore follows a similar architecture:

PixieGateway architecture diagram.

Here’s what you’ll see when you go to publish:

  1. From a Jupyter Notebook, the data scientist or developer runs the PixieApp. When ready, they can click on a new publish button (the electric plug icon 🔌) to deploy the PixieApp into the PixieGateway.
The publish button in the PixieApp UI within a notebook.

2. The Publish Configuration dialog contains the following panels:

a. Basic configuration info, such as server address, page title, and page icon.

The Basic Configuration options for publishing a PixieApp to the web.

b. List of package dependencies (automatically detected by static code analysis). Users will eventually be able to manually adjust.

List of imported packages within the Basic Configuration dialog.

c. Kernel Spec information:

Kernel Spec info found within the Basic Configuration dialog.

Installing the PixieGateway

If you want to demo PixieApp publishing locally, follow these instructions. If you’d rather deploy your application to the cloud, see the next section on using Kubernetes on Bluemix.

To start, install the pixiegateway package from PyPi. On the command line, run the following: (Note: PixieGateway supports both python 2.7 and 3.x)

Then you can start the PixieGateway with a simple command:

Example output:

Now, go to http://localhost:<portnumber>/pixieapps to review and use your apps.

[optional] Running PixieGateway in Kubernetes on IBM Bluemix

If you’re starting with Kubernetes on the IBM Bluemix container service, then read this intro article from Mark Watson that explains the basics of using the service with the bx and kubectl command-line tools.

Here are the steps to install PixieGateway using Kubernetes on Bluemix:

  1. Download the Kubernetes CLI and the Bluemix CLI.
  2. bx login [-sso] -a https://api.ng.bluemix.net
  3. bx target -o <YOUR_ORG> -s <YOUR_SPACE>
  4. bx plugin install container-service -r Bluemix
  5. bx cs init
  6. If not already done, create a cluster: bx cs cluster-create --name my-cluster
  7. Verify that the cluster is correctly created (this may take a few minutes): bx cs clusters
  8. Download the cluster config: bx cs cluster-config my-cluster
  9. Run the export command returned by the command above, e.g., export KUBECONFIG=/Users/dtaieb/.bluemix/plugins/container-service/clusters/davidcluster/kube-config-hou02-davidcluster.yml
  10. Create the deployment: kubectl create -f https://github.com/ibm-watson-data-lab/pixiegateway/raw/master/etc/deployment.yml
  11. Create the service: kubectl create -f https://github.com/ibm-watson-data-lab/pixiegateway/raw/master/etc/service.yml
  12. Verify the pods: kubectl get pods
  13. Verify the nodes: kubectl get nodes
  14. Verify the services: kubectl get services
  15. Finally, you can get the public ip address of the server: bx cs workers my-cluster
Workers information

16. To check that the install worked, enter the following URL in your browser: http://<publicIP>:32222/pixieapps . You’ll be able to interact with your PixieApp-published web apps from there.

List of published PixieApps. Use them like regular web applications!

Note: The screenshot above shows a list of PixieApps that have already been published. Do not be alarmed if your list is empty. We’ll get to publishing apps from notebooks in the next section.

17. Optional: In the future, if you need to update the PixieGateway version, you do not have to retrace the previous steps. Instead, simply delete the Kubernetes pod, which will cause Docker to restart and automatically pull down a new version of PixieDust, like so: kubectl delete pod <name> . Here, <name> is the pod’s name obtained with the command: kubectl get pods

Note: At the time of writing, the deployed PixieApps are not stored in a persisted volume, so deleting the pod will also delete them, and you’ll have to re-publish.

Deploying your first PixieApp

In this section, we’ll use the sample StockViewer PixieApp.

  1. Download the StockViewer Notebook (view the raw version and ensure it takes the .ipynb file extension when you save). Then, run it in your favorite Jupyter Notebook server, e.g., running locally or on IBM’s DSX.
  2. Run the cell that contains the PixieApp. Notice a new publish icon on the top right of the cell output. Click on it.
Our new friend, the PixieApp Publish icon 🔌

3. In the Options pane, enter the PixieGateway server location. (This is the only required option at this time.) Click the Publish button.

Our other new friend, the PixieApp Publish Configuration dialog!

4. If all goes well you should see the following results:

A successful PixieApp publish.

5. Click on the provided link and start using the PixieApp as a regular web application:

The StockViewer PixieApp running as a web application. Your work just went from raw data in a Jupyter Notebook to your boss’s web browser. Congrats!

What happens under the hood when I publish?

During the publication process, PixieGateway will perform static analysis on all the cells that contain code. (Only Python is supported at this time.) It will then try to find the first cell that contains a PixieApp. It does so by looking at a Python class definition annotated with the @PixieApp annotation, for example:

Note: If your notebook doesn’t contain the @PixieApp annotation, then the publishing fails.

A few important rules:

  • All the cells above the PixieApp cell are considered “warmup code.” This code runs only once in the host kernel, and the PixieApp can refer to the variables declared there. It is good practice to use the warmup code to store all large data sources and that would otherwise take a long time to load.
  • Since multiple PixieApps can run on the same IPython kernel, the variable names are automatically renamed to avoid accidental name collision with other PixieApps.
  • All the cells below the PixieApp cell are ignored, but you can still use them for debugging, for example.
  • The PixieApp cell code is considered the “run code.” When the user runs a PixieApp, the run code executes. It is important that the run code contains the PixieApp instantiation and the call to the run method (e.g., my_app = MyApp() and then my_app.run()).
  • It is good practice to store the PixieApp instance into its own variable (my_app = MyApp()). In this case, PixieGateway will rename the variable and assign it to the user session. Since a PixieApp can contain state, this will avoid name collision when multiple users access the sample PixieApp. If you choose to not store the PixieApp in its own variable, ensure that it doesn’t contain state (class variable).

What’s next for PixieGateway

There is much more work left to make PixieGateway feature-complete. Broadly, here’s what’s in the works:

  1. Authentication/Authorization: We’re planning to support an extensible mechanism to secure the deployed PixieApps, with a default built-in implementation.
  2. Scalability: PixieGateway needs to scale horizontally based on user load.
  3. Kernel runtime: Smart configuration of kernel based on specs provided at publish time.
  4. Spark Support: Right now, PixieGateway doesn’t support kernels connected to spark, but it will in future releases.
  5. Package dependencies: Improve automatic detection and provisioning of package dependencies, including support for versioning and editable installs (dev mode).

As always, we look forward to your feedback. Continue the conversation — and pull requests — on Github at https://github.com/ibm-watson-data-lab/pixiedust . Thanks!

--

--