Not the Chart You’re Looking for? Filter and Try again.

Filtering data with PixieDust

//va
Center for Open Source Data and AI Technologies
3 min readApr 17, 2018

--

Photo by Denisse Leon on Unsplash.

The New Kid on the Block

Previously, when using PixieDust to visualize your data, you first needed to shape the data by manipulating the DataFrame and then launch the visualization. As you explore the data, often you need to remove some data points and visualize only a subset. To accomplish this, you had to go back to your data, reshape it, relaunch PixieDust display with the updated data, and repeat this process until you get the visualization you desire.

With the introduction of Filter in PixieDust, this old workflow is made a little easier! In PixieDust 1.1.7 and later, you can define a filter while exploring the visualization generated by PixieDust. The filter options apply to the entire DataFrame and allow you to zoom in on your data to visualize a finer set of data points.

Just One Direction

The filter options can be accessed from the Filter button in the PixieDust display output toolbar.

Filter button in PixieDust display toolbar.

Clicking this button opens up the Filter panel where you can configure the filter options for the current DataFrame. Here’s the workflow:

  • Select a column
  • Enter a constraint
  • Click Apply

Note: Currently, filtering is allowed only on a single column at a time.

If the column selected is a numeric type, you have the option to filter the data with basic mathematical comparisons (i.e., less than, greater than, equal to) to some numeric value.

Only show rows where population is greater than 12000.

In addition, the Filter panel includes a Statistics section that updates with the selected column’s statistical info (e.g., count, mean, max).

Alternatively, if a non-numeric column is selected, you are provided with the option to perform a string match or regular expression match against the column values.

Only show rows where timezone contains the string “Africa”.

In this case, the Filter panel includes a Regex help section that provides some basic regular expression syntax.

Once set, the filter applies even if the chart type is changed. There’s no need to keep reapplying the filter as you change and explore different visualizations. And the filter is easily removed by clicking on the Clear button in the Filter panel.

Get In Sync

If you have not already done so, consider updating to the latest PixieDust version and take advantage of the Filter to simplify your workflow. In a Jupyter Notebook cell, simply run:

Your feedback on how to improve it, either by contributing a pull request or creating issues in the PixieDust GitHub repo, is welcomed.

--

--