Move over, MatPlotLib

Visualization made easy with faith, trust, and PixieDust

Christian Johnson
Aug 23, 2017 · 4 min read

Editor’s note: This article is part of an occasional series by the 2017 summer interns on the Watson Data Platform developer advocacy team, depicting projects they developed using Bluemix data services, Watson APIs, the IBM Data Science Experience, and more.

As a data scientist intern at IBM, I had the opportunity to work with some pretty huge data sets. Any data scientist will tell you that being able to visualize your data is extremely helpful. Visualization tools are used in multiple phases in a data science project.

Before my internship, my weapon of choice was MatPlotLib, a very powerful 2D plotting library for Python. Although it was my preferred plotting library, I certainly wouldn't consider myself a master. (╥_╥) Like most other plotting libraries for Python, MatPlotLib has a pretty steep learning curve due to its many intricacies. With a little bit of time and effort, though, I was always able to get it to show what I needed.

Early in my internship I met David Taieb, who introduced me to a tool he created called PixieDust, which he assured me would make my life as a data scientist easier. I admit that I was a bit skeptical at first, but the more I played around with it, the more I began to appreciate it for what it was. Now I’m sure a few of you are probably wondering, “Well what the heck is PixieDust?” ¯\_(ツ)_/¯ To put it simply, PixieDust is one of the most — if not the most — simple and easy-to-use visualization tools I've ever used for Python in notebooks.

Visualization is easily my most used feature in PixieDust, which is largely attributed to how easy it is to implement. Simply pass in a Spark DataFrame or pandas DataFrame into the display function and *BAM*, \(^‿^)/ PixieDust brings up a UI inside of my notebook, allowing me to choose exactly how I want to visualize the data and even making switching between different types of visualizations (e.g. bar chart, scatter plot, map) a breeze.

Plotting a line chart in a notebook without writing any visualization code

PixieDust is open source and you can therefore extend existing visualizations or contribute new visualizations. Contributions are welcome!

Now let’s talk about PixieApps. PixieApps are an extremely powerful component of PixieDust that takes visualization to a whole new level. You can use PixieApps to create highly customized and personalized dashboards with everything you need to see in one place. You can also use PixieApps to create dynamic and interactive applications, capable of performing a multitude of transformations on your data.

One limitation of PixieDust is that you can only see one visualization at a time in a single notebook cell, which makes quickly comparing visualizations a bit difficult when you have a lot of things you want to compare. To help solve this problem, over the summer I created a PixieApp to display different visualizations side by side. o(^-^)o

Arranging multiple scatter plots side-by-side using a PixieApp

The main data science project I worked on over the summer was a study of how the weather affects traffic collisions in the greater metropolitan area of New York. Given that The Weather Company is part of IBM, we had tons of historical weather data at our disposal. The other data set was provided by the New York Police Department (NYPD), and contained various details of motor vehicle collisions in New York City. After cleaning up the two data sets, joining them together, and removing unnecessary features, we were left with an enormous DataFrame.

In order to glean insights from the data, I used a PixieApp to create a map which, given any particular weather condition contained in the weather data (e.g. thunderstorms), showed a map of “hotspots” where traffic collisions occurred under that condition. The PixieApp first subsets our enormous DataFrame by creating a temporary DataFrame containing only the rows matching our selected weather condition. It then uses MapBox to create a heat map based on the temporary DataFrame. Check out how cool it is below. ٩(^ᴗ^)۶

Correlating weather conditions with traffic accident statistics

Although it’s the end of my internship, I can definitely see myself incorporating PixieDust into many of my future data science projects because of how easy it is to use and how quickly it lets me whip up great looking visualizations.

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

)

Thanks to Patrick Titzler and Teri Chadbourne

Christian Johnson

Written by

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade