PixieDust gets its first community-driven feature in 1.0.4

Now with time series data, the fairy tale continues

Published in

Center for Open Source Data and AI Technologies

3 min readApr 13, 2017

Last month I announced the availability of PixieDust 1.0. Since then, community adoption has been fantastic. Based on repo stars and on feedback at conferences and events, more developers and data scientists are using PixieDust as part of their work in Jupyter Notebooks.

Today, we’re releasing version 1.0.4 on PyPi, but what’s noteworthy is that this version is getting a new feature that has been prioritized by the community: time series data. (Personally, there’s no better feeling than working on a feature that users are clamoring for.)

The PixieDust fairy tale continues with time series data.

Once upon a time [series]

PixieDust now supports display of time series data for bar and line charts. Previously, when loading data into PySpark DataFrames from data sources that required schema discovery (CSVs, JSON, etc.), datetime values were often converted into strings. This caused problems when visualizing the data (sorting, formatting, etc.). Fixing it required complicated massaging of the data.

In the example below, I want to display stock values over time. Unfortunately, Spark converts the date values to Unix timestamps, and the results are not visualized correctly:

PixieDust: before using the new time series option in version 1.0.4. Unix timestamps muddy the sort order.

Users can now click the “Time Series” checkbox to have PixieDust automatically convert this data into a correctly formatted date.

PixieDust: after applying the new time series option. Unix timestamps are now converted into datetime64 values and sorted properly.

Gaze upon the crystal ball of PixieDust: Pixie Apps

I’ll let you in on a secret: there is a bigger feature that has been dark-launched with PixieDust 1.0.4. Even though it’s not ready yet, I thought I would float the idea here for feedback as we refine the design.

This new feature is called “Pixie App.” Some of its features, like routes, are inspired by the popular AngularJS framework for web apps, but applied to the context of data science notebooks. The idea is to let developers easily create bigger building blocks that encapsulate their data (Model), UI (View) and logic (Controller). MVC, anyone?

Pixie App lets you refactor your projects for speed and repeatability. For example, you could use it to build an interactive dashboard with widgets communicating via events, or automate part of a machine learning pipeline that requires multiple manual steps and replace it with a nice UI.

From a developer’s perspective, Pixie Apps have been designed to minimize boilerplate code. All you need to get started is create a Python class and provide HTML fragments for each widget. The logic, workflow, entity, and event handling is expressed via HTML, microformats, and embedded Python.

Here’s some sample code, for an idea:

A sample Pixie App that creates a toolbar to control a widget.

To run the sample app above, you’ll need some data. (Note: for Pixie Apps, you don’t always necessarily need data.) The code below creates a simple PySpark DataFrame, which is passed to the run function. It also uses the runInDialog='true' option to automatically display the app in a dialog, as opposed to the cell’s output:

Running the sample Pixie App with some data.

The results are as follows: