Analytics Vidhya
Published in

Analytics Vidhya

A Gentle Introduction to Interactive Geoplots With Plotly And MapBox

Anyone who owns a smartphone these days is well aware of location tracking. Almost any app you use these days wants to use it to understand the demographics of it’s customer base. Ride hailing services like Uber and Ola offer rides based on locations, time and traffic. Thanks to geoplots, now you can visualize this kind of location data!

Data visualization tools are getting swankier and more effective at showing patterns and insights through plots. You can even build interactive 3D plots on your machine thanks to advancements in these tools.

In this article, we will be exploring the concept and use of geoplots. To do this, we will leverage the popular plotly library in Python, with an integration of Mapbox plots (more on this later). We are going to use the New York City Taxi Fare Prediction dataset throughout this article so go ahead and download it from this link. Let’s dig in!

New York City Taxi Fare Prediction

Prerequisites

  1. Pandas
  2. Matplotlib

Let’s import the basic libraries first. You will need to install plotly in your machine before executing the following code:

#import necessary libraries
import numpy as np
import pandas as pd
import plotly
import plotly.plotly as py
import plotly.offline as offline
import plotly.graph_objs as go

By default, plotly works in online mode, which requires you to generate a personal API token after you reach the public API limit. If you want to share your visualization with others and modify the data points dynamically to see updated visualizations, then the online mode will do that for you.

However, if you wish to work offline inside your Jupyter Notebook, then you can do so by adding the following line:

init_notebook_mode(connected=True)

Coming to our dataset, it has more than a million rows! A traditional computer might have a hard time processing all those data points without heating up (or even crashing). With pandas, we can just parse the first n_rows of a data frame:

You will need a personal ACCESS TOKEN from mapbox to plot custom maps. The plots are drawn from two objects:

The first one is data

The other is the layout of the plot

The data object is a python list object type with the go.Scattermapbox function from plotly. The parameters are declared as python dictionary key value pairs. For more details on parameters and implementations, refer to the plotly documentation page.

MapBox themes
shaz13_custom_style = "mapbox://styles/shaz13/cjiog1iqa1vkd2soeu5eocy4i"#set the geo=spatial data
data = [go.Scattermapbox(
lat= train['pickup_latitude'] ,
lon= train['pickup_longitude'],
customdata = train['key'],
mode='markers',
marker=dict(
size= 4,
color = 'gold',
opacity = .8,
),
)]
#set the layout to plot
layout = go.Layout(autosize=False,
mapbox= dict(accesstoken="YOUR_ACCESS_TOKEN",
bearing=10,
pitch=60,
zoom=13,
center= dict(lat=40.721319,
lon=-73.987130),
style=shaz13_custom_style),
width=900,
height=600,
title = "Pick up Locations in NewYork")

Almost done! Now, all you need to do is wrap this into a dict object referred to as a figure. That should initialize the data points and map into our fig object. You can plot this by simply using the iplot function:

fig = dict(data=data, layout=layout)
iplot(fig)

Another awesome thing you can do is zoom into the plots and check all the miniature points, as shown below:

Zoomed image with different theme

There is a whole gallery of different themes available at Mapbox that you can try out. You can also design your own theme in mapbox studio.

Snapshop of Map Box Studio

You can make your custom theme accessible to others by making it into a public theme and copying the mapstyle link from the dashboard:

Exploring plots

There is much to explore with these map plots. With plotly’s inbuilt functionality, we can visualize two sets of conditions in the same plot! A good example of this is plotting the early in-day vs. late pickup location at airports in New York.

First, let’s extract the date-time features from the timestamp:

train['pickup_datetime_month'] = train['pickup_datetime'].dt.month
train['pickup_datetime_year'] = train['pickup_datetime'].dt.year
train['pickup_datetime_day_of_week_name']
= train['pickup_datetime'].dt.weekday_name
train['pickup_datetime_day_of_week']
= train['pickup_datetime'].dt.weekday
train['pickup_datetime_day_of_hour'] = train['pickup_datetime'].dt.hour

Great! Now we have the year, hour, day, month and weekday name information with us. Let’s check out some patterns of New Yorkers!

A typical business day starts from Monday. So, we will segment our data on a weekday basis. The pickup_datetime_day_of_week is a numerical representation of pickup_datetime_day_of_week_name (starting from Monday with 0).

#Weekday
business_train = train[train['pickup_datetime_day_of_week'] < 5 ]
#Bining time of the day
early_business_hours = business_train[business_train['pickup_datetime_day_of_hour'] < 10]
late_business_hours = business_train[business_train['pickup_datetime_day_of_hour'] > 18]
data = [go.Scattermapbox(
lat= early_business_hours['dropoff_latitude'] ,
lon= early_business_hours['dropoff_longitude'],
customdata = early_business_hours['key'],
mode='markers',
marker=dict(
size= 5,
color = 'gold',
opacity = .8),
name ='early_business_hours'
),
go.Scattermapbox(
lat= late_business_hours['dropoff_latitude'] ,
lon= late_business_hours['dropoff_longitude'],
customdata = late_business_hours['key'],
mode='markers',
marker=dict(
size= 5,
color = 'cyan',
opacity = .8),
name ='late_business_hours'
)]
layout = go.Layout(autosize=False,
mapbox= dict(accesstoken="YOUR_ACCESS_TOKEN",
bearing=10,
pitch=60,
zoom=13,
center= dict(
lat=40.721319,
lon=-73.987130),
style= "mapbox://styles/shaz13/cjiog1iqa1vkd2soeu5eocy4i"),
width=900,
height=600, title = "Early vs. Late Business Days Pickup Locations")
fig = dict(data=data, layout=layout)
iplot(fig)
Early vs. Late Business Days Pickup locations

Looking good. Many of these locations might be offices or work places. It will be interesting to compare this with weekends.

weekend_train  = train[train['pickup_datetime_day_of_week'] >= 5 ]
early_weekend_hours = weekend_train[weekend_train['pickup_datetime_day_of_hour'] < 10]
late_weekend_hours = weekend_train[weekend_train['pickup_datetime_day_of_hour'] > 6]
data = [go.Scattermapbox(
lat= early_weekend_hours['dropoff_latitude'] ,
lon= early_weekend_hours['dropoff_longitude'],
customdata = early_weekend_hours['key'],
mode='markers',
marker=dict(
size= 5,
color = 'violet',
opacity = .8),
name ='early_weekend_hours'
),
go.Scattermapbox(
lat= late_weekend_hours['dropoff_latitude'] ,
lon= late_weekend_hours['dropoff_longitude'],
customdata = late_weekend_hours['key'],
mode='markers',
marker=dict(
size= 5,
color = 'orange',
opacity = .8),
name ='late_weekend_hours'
)]
layout = go.Layout(autosize=False,
mapbox= dict(accesstoken="YOUR_ACCESS_TOKEN",
bearing=10,
pitch=60,
zoom=13,
center= dict(
lat=40.721319,
lon=-73.987130),
style= "mapbox://styles/shaz13/cjiog1iqa1vkd2soeu5eocy4i"),
width=900,
height=600, title = "Early vs. Late Weekend Days Pickup Locations")
fig = dict(data=data, layout=layout)
iplot(fig)
Early vs. Late Weekend Days pickup locations in New York

Even such minute information can bring out hidden patterns in the data, such as the behavior of passengers, timings of the flights versus the cab bookings, etc. Below are a couple of interesting patterns I observed on a random sample of 30,000 rows:

  1. New Yorkers tend to take a cab in the late business hours as compared to earlier in the day
  2. We understand that fare price depends on the distance and time traveled. But, how often does a certain location attract a higher fare? And, why?
high_fares = train[train['fare_amount'] > train.fare_amount.mean() + 3* train.fare_amount.std()]data = [go.Scattermapbox(
lat= high_fares['pickup_latitude'] ,
lon= high_fares['pickup_longitude'],
customdata = high_fares['key'],
mode='markers',
marker=dict(
size= 8,
color = 'violet',
opacity = .8),
name ='high_fares_pick_up'
),
go.Scattermapbox(
lat= high_fares['dropoff_latitude'] ,
lon= high_fares['dropoff_longitude'],
customdata = high_fares['key'],
mode='markers',
marker=dict(
size= 8,
color = 'gold',
opacity = .8),
name ='high_fares_drop_off'
)]
layout = go.Layout(autosize=False,
mapbox= dict(accesstoken="YOUR_ACCESS_TOKEN",
bearing=10,
pitch=60,
zoom=13,
center= dict(
lat=40.721319,
lon=-73.987130),
style= "mapbox://styles/shaz13/cjk4wlc1s02bm2smsqd7qtjhs"),
width=900,
height=600, title = "High Fare Locations")
fig = dict(data=data, layout=layout)
iplot(fig)
Places with High Fare Billed Trips

End Notes

The data we are given usually has a lot of hidden insights and patterns we need to extract by playing around with it. Creativity, curiosity and imagination are the key skills (along with data science of course!) that you need in order to perform this kind of analysis. The complete tutorial and the implementation of code is available on my Kaggle kernel.

Do share your thoughts, ideas, and feedback in the comments below.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mohammad Shahebaz

Kaggle Grandmaster 🏅| 👨🏻‍💻 Data Scientist | TensorFlow Dev 🔥