Mapping for Data Science with PixieDust and Mapbox

Add another layer to your Jupyter notebooks with built-in map rendering

--

You’re doing your data a disservice if you don’t use maps. Nearly all data has a spatial component — customer locations, crime, election results, traffic incidents, points-of-purchase, infrastructure locations—and if you’re not familiar with some basic mapping tools, you’re not doing good data science.

Seeing your data on a map has many powerful benefits. At the exploratory stage of data science, it’s a great way to get a feel for the geographic distribution of your data.

Home sales over a million dollars, winter 2016. Darker means higher price. Data courtesy of Redfin.com.

Making scatterplots less scattershot

A standard scatterplot is a good starting point. It can work OK for spatial data — just use longitude and latitude as your X and Y values — but plotting those points on a real map shows your data’s relationship to real-world features. For example, a scatterplot might reveal that the data is clustered into four major groupings, but a map could show not only those four groupings, but also that they are all near subway stations. Overlaying your data on a map can surface unseen patterns.

A plain scatterplot on the home sales data set lacks context without a map.

At the presentation stage of data science, using maps is a no-brainer. Combining data with maps is a natural storytelling device (when the story you’re telling has a geographic aspect, of course).

So you’ve never done any mapping before, and it seems hard? Not to worry. PixieDust makes it easy, with a little help from Mapbox APIs (and to a lesser extent the Google Maps API), you can get up and running with some beautiful map-based visualizations in no time!

Getting Started

If you’re unfamiliar with PixieDust, check out this introductory article and get your Jupyter Notebook-based data science environment up and running. PixieDust comes with mapping goodness baked in.

As you read through the next section, you can follow along in the Jupyter notebook called pixiedust_mapbox_geocharts hosted on the IBM Data Science Experience.

GeoCharts

A Google GeoChart, which you can easily render in PixieDust given the correct field names.

If your data contains a place name column such as country, province or state names, you can make what Google calls a GeoChart — a map that shows those regions color-coded based on the value of a numeric column in your data.

To create a GeoChart in PixieDust, you must first have a Spark or Pandas DataFrame with a place name column. Invoke PixieDust on that DataFrame with the display() command: display(mydataframe).

Click on the chart menu (to the right of the table button) and select the Map item (it’s the one with the globe icon). The options dialog should pop up. If it doesn’t, click on the Options button and drag the field that has place names into Keys. Then for the Values field, choose any numeric field you want to visualize.

Within the Display Mode menu, choose:

  • Region to color the entire area of your named places, e.g., countries, provinces, or states.
  • Markers to place a circle in the center of the region which is scaled according to the data selected for the Value field.
  • Text to label regions with labels like Russia or Asia.

This is good stuff, but if you’re doing heavyweight data science, in most cases your data will be disaggregated down to the point (latitude/longitude) level. This is where we chose to use our mapping partner Mapbox’s API instead of one of Google’s mapping APIs.

Selecting the map option from PixieDust’s chart menu, and using the mapbox renderer.

Point mapping with Mapbox

The Mapbox option lets you create a map of geographic point data. Your DataFrame needs at least the following three fields in order to work with this renderer:

  • a latitude field named latitude, lat, or y
  • a longitude field named longitude, lon, long, or x
  • a numeric field for visualization

To use the Mapbox renderer, you need a free API key from Mapbox. You can get one on their website at https://www.mapbox.com/signup/. When you get your key, enter it in the Options dialog box.

In the Options dialog, drag both your latitude and longitude fields into Keys. Then choose any numeric fields for Values. Only the first one you choose is used to color the map thematically, but any other fields specified in Values appear in a tooltip when you hover over a data point on the map.

You can also choose the style of the underlying base map, which gives context to your data. The image below uses Mapbox’s “light” style, which works great with a data overlay, as the lightly colored streets and place names don’t fight for attention with your data.

Following the instructions above using PixieDust’s test dataset #6, you can reproduce this tasteful map.

If you want to see what a place really looks like, zoom in and switch to satellite view:

I can’t wait to see what cool things you do with the mapping features in PixieDust. Let me know how you use it here in the comments below. And please ♡ this article to recommend it to other Medium readers.

--

--