You Too Can Make Magic (in Jupyter Notebooks with PixieDust)

Getting started with custom visualizations, simple tables & word clouds

//va
Center for Open Source Data and AI Technologies
7 min readApr 26, 2017

--

PixieDust, the open source Python helper library that extends the usability of notebooks, is quite adept at creating magic in notebooks. And with multiple visualizations available, there is magic to be had for just about any situation.

However, for the times when the default visualizations are just not quite what you are looking for, there are options. Sure, you can submit an enhancement request to get new visualizations into PixieDust, but why not get ahead of the game and try to create one yourself!

Most magic tends to be secretive and not readily shared, but PixieDust is open to all. With the PixieDust Extensibility APIs, you can create and deliver your own brand of visualization magic to notebook users without forcing them to type much, if any, lines of code.

The preparation

Like all great magic, a little prep work is required. You can follow the steps outlined here in any Jupyter Notebook environment. However, the instructions and screenshots walk through the notebook in IBM’s Data Science Experience (DSX). The first step is to sign into DSX and create a Notebook.

For best results, use the latest version of either Mozilla Firefox or Google Chrome.

Create a new notebook

After signing into DSX:

  1. On the upper right of the DSX site, click the + and choose Create project.
  2. Enter a Name for your project
  3. Select a Spark Service
  4. Click Create

From within the new project, you will create your notebook:

  1. Click add notebooks
  2. Click the Blank tab in the Create Notebook form
  3. Enter a Name for the notebook
  4. Select Python 2 for the Language
  5. Select 1.6 for the Spark version
  6. Select the Spark Service
  7. Click Create Notebook

When you use a notebook in DSX, you can run a cell only by selecting it, then going to the toolbar and clicking on the Run Cell (▸) button. When a cell is running, an [*] is shown beside the cell. Once the cell has finished, the asterisk is replaced by a number.

If you don’t see the notebook toolbar showing the Run Cell (▸) button and other notebook controls, you are not in edit mode. Go to the dark blue toolbar above the notebook and click the edit (pencil) icon.

https://cdn-images-1.medium.com/max/1600/1*_TdX11w44zy5_PbMqpPREg.png
Data Science Experience toolbar

The pledge

Using PixieDust in a notebook is straightforward. No misdirection. No sleight of hand.

Install PixieDust

DSX already comes with the PixieDust library installed, but it is always a good idea to make sure you have the latest version. In the first cell of the notebook, enter:

Click on the Run Cell (▸) button. After the cell finishes running, if you are instructed to restart the kernel, from the notebook toolbar menu:

  1. Go to > Kernel > Restart
  2. Click Restart in the confirmation dialog

The status of the kernel briefly flashes near the upper right corner, alerting when it is Not Connected, Restarting, Ready, etc.

Import PixieDust

At this point, you can introduce your lovely assistant: data! In the next cell enter and run:

Whenever the kernel is restarted, import pixiedust must be run before continuing. Any previous loaded data will also need to be re-loaded.

In a new cell enter and run:

Using PixieDust’s sampleData API, you have loaded some sample data. More specifically, crime data from the city of Boston (over a two-week span). And you are viewing it using PixieDust's display API.

You can try different visualizations by selecting different chart types and renderers provided by PixieDust.

The turn

One of PixieDust’s default visualizations is a nice table view of your data. To start, you’ll customize a version of this table. It’s not quite jaw-dropping graphics, but it will cover the basics (i.e., template, metadata) of creating your own visualization.

The template

Your first step will be to create the HTML fragment for the template of your visualization. PixieDust supports Jinja2, the popular Python templating engine. This allows for adding some logic and conditional statements to simplify your template. In addition, you can also make use of Bootstrap CSS classes and Font Awesome icons.

To define your template:

  1. Import from pixiedust.display.display
  2. Create a class that extends Display
  3. Implement def doRender(self, handlerId) in your class
  4. In doRender, call self._addHTMLTemplateString, passing your template

You can access your DataFrame via the entity variable

Python is indentation sensitive. Do not mix space and tab indentations. Either use strictly spaces or tabs for all indentations.

In a new cell enter and run:

PixieDust Extensibility API — Simple Table Template

The metadata

To be able to invoke your visualization, it must be added to PixieDust’s display output toolbar. Menu options can be added to the toolbar area by including some specific metadata.

To specify the metadata:

  1. Create a class that extends DisplayHandlerMeta
  2. Annotate the class with @PixiedustDisplay()
  3. Implement def getMenuInfo(self,entity,dataHandler) in your class
  4. Annotate the getMenuInfo with @addId
  5. In getMenuInfo, return a JSON array defining attributes for your menu option
  6. Implement def newDisplayHandler(self,options,entity) in your class
  7. In newDisplayHandler, return the response from the call to your HTML fragment class

Metadata attributes include:

  • categoryId - used to group menu options
  • title - title or label for the menu option
  • icon - icon class for the menu option (accepts Font Awesome icon css classes)
  • id - unique identifier for the menu option

In a new cell enter and run:

PixieDust Extensibility API — Simple Table Metadata

The output

You are ready to try out your new visualization. If all goes well, the table menu option should now be a dropdown that includes your new menu option. In a new cell enter and run:

Click the table dropdown menu and choose My Simple Table:

My Simple Table menu option

Congrats! You have created your first PixieDust visualization.

My Simple Table

You’ve seen the ease in which a custom visualization can be made. Now you can try a more interesting one.

The prestige

Word (or Tag) clouds are a common way to visualize text data. So, why not try to create a PixieDust word cloud visualization? Rather than trying to write the logic for generating the word cloud, you can rely on a little word cloud generator that already exists and is easy to use.

In a new cell enter and run:

This will install the word cloud generator library. After the install completes, restart the kernel, re-import PixieDust, and re-download the sample data. You can scroll up and find the import and download cells, or, after restarting the kernel, insert a new cell and run:

The look of a cloud

In a new cell enter and run:

PixieDust Extensibility API — Word Cloud Template

The template has been defined. If you look closely, you can see

  1. entity is converted to a Dictionary (dfdict)
  2. dfdict is turned into a WordCloud object (wc)
  3. wc is encoded into base64 (img_str)
  4. img_str is passed to an HTML img tag

Cloud awareness

In a new cell enter and run:

PixieDust Extensibility API — Word Cloud Metadata

With this metadata code, you are stating you are only interested in DataFrame classes. You also specified that the Simple Word Cloud menu option get added to the chart dropdown.

Show me the cloud

The word cloud generator needs a list of words and their frequencies. You can take the sample data and create a new DataFrame, which includes the necessary data. In a new cell enter and run:

This will create a new DataFrame showing the number of incidents reported by street. In a next cell enter and run:

Click the chart dropdown menu and choose Simple Word Cloud:

Simple Word Cloud menu option

Congrats! You have created your second PixieDust visualization. You have yourself a word cloud. Based on the size of the text you can quickly see which streets had a higher number of crimes reported.

Boston Crime Data Streets Word Cloud

Curtain Call

While the visualizations here were created in the notebook, they can easily be made into and distributed as a Python module for better sharing and integrating into PixieDust. In fact, my next article covers just that:

Neither the table nor the word cloud were intricate visualizations. However, they did provide you with the building blocks needed to tackle more advanced visualizations. You will find guidance and additional information in the GitHub repo wiki.

All are invited to contribute and pull requests are welcome. Who knows — maybe you could even contribute your visualization back to the PixieDust community? Magic!

If you enjoyed this article, please ♡ it to recommend it to other Medium readers.

--

--