Deploy Your Analytics as Web Apps Using PixieDust’s 1.1 Release
From data science, directly to the web
Question: What? I can deploy the analytics I built in my Jupyter Notebook as a web application?
Answer: Yes, this is not a typo! This latest innovation from the PixieDust team lets you deploy a PixieApp into a web application with a simple click (no added code).
Note: This feature is still in alpha and therefore not ready to be used in production. For more on the PixieGateway release alpha, see the release notes.
Note: Because of an error in the build files, we had to issue an emergency patch. The correct version for this release is now 1.1.0.1.
End-to-End analytics pipeline
Before publishing your first PixieApp to the web, let’s quickly recap the evolution of the PixieDust helper library for Jupyter Notebooks.
- It started by providing two simple APIs: one to load data so you don’t have to learn the intricacies of Spark APIs, and one for auto-visualizing a data set, whether it’s a Spark/Pandas DataFrame or a GraphFrame.
- Then came PixieApps: they let you build embedded application UIs within notebooks that leverage your analytics and charts with minimal coding. (See this nice example by Jordan George.)
Naturally, the next step was to free PixieApps from the limitation of running only within notebooks and to run them as regular web apps. The application lifecycle now looks like this:
Here’s the collaboration workflow:
- Data scientists use a Python notebook to load, enrich, analyze data, and create analytics (dashboards, reports, machine learning models, etc.).
- From the same notebook, developers create a PixieApp to operationalize these analytics. Developers can also collaborate on analytics with JavaScript using PixieDust Nodebooks.
- Once ready, developers publish the PixieApp as a web application, where it can be easily consumed interactively by line-of-business users.
How does it work?
The PixieGateway web application server is responsible for loading and running PixieApps. It is built on top of the Jupyter Kernel Gateway and therefore follows a similar architecture:
Here’s what you’ll see when you go to publish:
- From a Jupyter Notebook, the data scientist or developer runs the PixieApp. When ready, they can click on a new publish button (the electric plug icon 🔌) to deploy the PixieApp into the PixieGateway.
2. The Publish Configuration dialog contains the following panels:
a. Basic configuration info, such as server address, page title, and page icon.
b. List of package dependencies (automatically detected by static code analysis). Users will eventually be able to manually adjust.
c. Kernel Spec information:
Installing the PixieGateway
If you want to demo PixieApp publishing locally, follow these instructions. If you’d rather deploy your application to the cloud, see the next section on using Kubernetes on Bluemix.
To start, install the pixiegateway package from PyPi. On the command line, run the following: (Note: PixieGateway supports both python 2.7 and 3.x)
pip install pixiegateway
Then you can start the PixieGateway with a simple command:
jupyter pixiegateway --port <portnumber>
Example output:
dtaieb$ jupyter pixiegateway --port 8899
[PixieGatewayApp] Kernel started: b5be0b3b-a018–4ace-95d1-d94b556a0bfe
kernel client initialized
[PixieGatewayApp] Jupyter Kernel Gateway at http://127.0.0.1:8899
Now, go to http://localhost:<portnumber>/pixieapps
to review and use your apps.
[optional] Running PixieGateway in Kubernetes on IBM Bluemix
If you’re starting with Kubernetes on the IBM Bluemix container service, then read this intro article from Mark Watson that explains the basics of using the service with the bx
and kubectl
command-line tools.
Here are the steps to install PixieGateway using Kubernetes on Bluemix:
- Download the Kubernetes CLI and the Bluemix CLI.
bx login [-sso] -a https://api.ng.bluemix.net
bx target -o <YOUR_ORG> -s <YOUR_SPACE>
bx plugin install container-service -r Bluemix
bx cs init
- If not already done, create a cluster:
bx cs cluster-create --name my-cluster
- Verify that the cluster is correctly created (this may take a few minutes):
bx cs clusters
- Download the cluster config:
bx cs cluster-config my-cluster
- Run the export command returned by the command above, e.g.,
export KUBECONFIG=/Users/dtaieb/.bluemix/plugins/container-service/clusters/davidcluster/kube-config-hou02-davidcluster.yml
- Create the deployment:
kubectl create -f https://github.com/ibm-watson-data-lab/pixiegateway/raw/master/etc/deployment.yml
- Create the service:
kubectl create -f https://github.com/ibm-watson-data-lab/pixiegateway/raw/master/etc/service.yml
- Verify the pods:
kubectl get pods
- Verify the nodes:
kubectl get nodes
- Verify the services:
kubectl get services
- Finally, you can get the public ip address of the server:
bx cs workers my-cluster
16. To check that the install worked, enter the following URL in your browser: http://<publicIP>:32222/pixieapps
. You’ll be able to interact with your PixieApp-published web apps from there.
Note: The screenshot above shows a list of PixieApps that have already been published. Do not be alarmed if your list is empty. We’ll get to publishing apps from notebooks in the next section.
17. Optional: In the future, if you need to update the PixieGateway version, you do not have to retrace the previous steps. Instead, simply delete the Kubernetes pod, which will cause Docker to restart and automatically pull down a new version of PixieDust, like so: kubectl delete pod <name>
. Here, <name>
is the pod’s name obtained with the command: kubectl get pods
Note: At the time of writing, the deployed PixieApps are not stored in a persisted volume, so deleting the pod will also delete them, and you’ll have to re-publish.
Deploying your first PixieApp
In this section, we’ll use the sample StockViewer PixieApp.
- Download the StockViewer Notebook (view the raw version and ensure it takes the
.ipynb
file extension when you save). Then, run it in your favorite Jupyter Notebook server, e.g., running locally or on IBM’s DSX. - Run the cell that contains the PixieApp. Notice a new publish icon on the top right of the cell output. Click on it.
3. In the Options pane, enter the PixieGateway server location. (This is the only required option at this time.) Click the Publish button.
4. If all goes well you should see the following results:
5. Click on the provided link and start using the PixieApp as a regular web application:
What happens under the hood when I publish?
During the publication process, PixieGateway will perform static analysis on all the cells that contain code. (Only Python is supported at this time.) It will then try to find the first cell that contains a PixieApp. It does so by looking at a Python class definition annotated with the @PixieApp
annotation, for example:
from pixiedust.display.app import *
@PixieApp
class MyApp():
@route()
def main_screen(self):
return "<div>Hello World</div>"my_app = MyApp()
my_app.run()
Note: If your notebook doesn’t contain the
@PixieApp
annotation, then the publishing fails.
A few important rules:
- All the cells above the PixieApp cell are considered “warmup code.” This code runs only once in the host kernel, and the PixieApp can refer to the variables declared there. It is good practice to use the warmup code to store all large data sources and that would otherwise take a long time to load.
- Since multiple PixieApps can run on the same IPython kernel, the variable names are automatically renamed to avoid accidental name collision with other PixieApps.
- All the cells below the PixieApp cell are ignored, but you can still use them for debugging, for example.
- The PixieApp cell code is considered the “run code.” When the user runs a PixieApp, the run code executes. It is important that the run code contains the PixieApp instantiation and the call to the run method (e.g.,
my_app = MyApp()
and thenmy_app.run()
). - It is good practice to store the PixieApp instance into its own variable (
my_app = MyApp()
). In this case, PixieGateway will rename the variable and assign it to the user session. Since a PixieApp can contain state, this will avoid name collision when multiple users access the sample PixieApp. If you choose to not store the PixieApp in its own variable, ensure that it doesn’t contain state (class variable).
What’s next for PixieGateway
There is much more work left to make PixieGateway feature-complete. Broadly, here’s what’s in the works:
- Authentication/Authorization: We’re planning to support an extensible mechanism to secure the deployed PixieApps, with a default built-in implementation.
- Scalability: PixieGateway needs to scale horizontally based on user load.
- Kernel runtime: Smart configuration of kernel based on specs provided at publish time.
- Spark Support: Right now, PixieGateway doesn’t support kernels connected to spark, but it will in future releases.
- Package dependencies: Improve automatic detection and provisioning of package dependencies, including support for versioning and editable installs (dev mode).
As always, we look forward to your feedback. Continue the conversation — and pull requests — on Github at https://github.com/ibm-watson-data-lab/pixiedust . Thanks!