Plotting Geo-scatter plot using Plotly and Dash

Arun Rajan
Analytics Vidhya
Published in
6 min readDec 16, 2019

Data visualisation has become an effective tool in mining patterns and insight from data. Visualisation when coupled with geographical coordinates help explore geo spatial data. Geo scatter plot is a kind of scatter plot, where data points are plotted on geographical coordinates instead of Cartesian coordinates. We would be using dash components and mapbox maps API to create geo-scatter plot on mapbox maps. Dash is a productive framework for building web applications written on top of flask, react.js, plotly.js. It is ideal for building data visualization apps with highly custom interfaces in python. We here use two major components of dash. Dash HTML component (dash_html_component) provides python abstraction around HTML, CSS and javascript, whereas dash core component(dash_core_component) provides components for interactive user interfaces.

Data Description

Rapid growth in smart phone users has paved the way for new smart phone entrants across the globe. Here I have created a data set of smart phone users at Kochi location(Kerala, India),from the year 2015 to 2019. This data is being created for the sole purpose of learning dash and plotly, and hence nothing should be inferred from the visualisation. The data is generated using python faker and random library (the scope of the data creation is outside the article). The Data contains the following attributes:

Year, Latitude,Longitude, Mobile

Dependencies

Certain python libraries are not available as inbuilt libraries in python. These include Pandas, Shapely and Dash. These needed to be installed using pip or conda installer(depending on the environment which we have created).

To make use of interactive custom online maps, we also need to create mapbox accounts and generate access tokens. Access tokens helps to associate API requests with our mapbox account.

To recapitulate:

1. Install required python libraries — Pandas, Shapely, Dash

2. Create Mapbox account

3. Create API access token

Data Preparation

Prior plotting the geo scatter plot, one needs to demarcate geographic boundary as per the requirement. This helps to analyse the correlation (if any) between the scatter points and the geographic location within the demarcated boundary. Explicit exclusion of scatters outside a bounded location could be achieved by first,setting the polygon boundary from geojson.io, and then checking if the data points fall within the bounded location using python shapely library. Draw polygon and save the geojson file.

Creating geojson file for the required location

The data and geojson file needs to be imported. Filter the location from the data frame (df) using the geojson file. Once the filtering is done, we save the data in a csv file. This file is later accessed for plotting the geo scatter.

import json
import pandas as pd
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
with open(‘KochiMainland.geojson’) as data_file:
data = json.load(data_file)
df = pd.read_csv(‘KochiMobile.csv’)lon = []
lat = []
mob = []
year = []
for i in range(len(df)):
if(polygon.contains(Point(df[‘Longitude’][i],df[‘Latitude’][i]))):
lon.append(df[‘Longitude’][i])
lat.append(df[‘Longitude’][i])
year.append(df[‘Year’][i])
mob.append(df[‘Year’][i])

newDf = pd.DataFrame({‘Year’:year,’Longitude’:lon,’Latitude’:lat,’Mobile’:mob})
newDf.to_csv(‘KochiMobile.csv’)

Once the data is filtered and ready to be plotted, now we can start creating the geo scatter. Let’s start importing the necessary libraries and initialise the dash application.

import dash
import json
import random
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import dash_core_components as dcc
import dash_html_components as html
from faker import Faker
from collections import Counter
from dash.dependencies import Input,Output,State
app=dash.Dash()

Since the number of mobile are large and changing every year, I would prefer to allocate random different colour to plot different smart phone users. We also need to create a back-ground colour and colour for the text.

colors = {
‘backgroundMain’: ‘#ffffff’,
‘backgroundSub’ : ‘#000000’,
‘textMain’: ‘#000000’,
‘textSub’ : ‘##ffffff’
}
r = np.rint(np.random.uniform(0,255,len(df[‘Mobile’].unique())))
g = np.rint(np.random.uniform(0,255,len(df[‘Mobile’].unique())))
b = np.rint(np.random.uniform(0,255,len(df[‘Mobile’].unique())))
def rgb(r,g,b):
return ‘rgb(‘+str(r)+’,’+str(g)+’,’+str(b)+’)’

(r,g,b) combination is later used to set the marker colour for each smart phone. Now, let’s set the layout for the complete dashboard using dash html component. The script would look something similar as below:

app.layout = 
html.Div([html.Div([html.H1(‘Dummy Mobile Usage’)],style= {‘textAlign’:’center’,’color’:colors[‘textMain’]}),
html.Div(html.P(id=’title-rangeslider-break’),style={‘padding-top’:’10px’}),
html.Div([dcc.RangeSlider(id = ‘range-slider’,min = 2015,max = 2019, marks = {i:{‘label’:str(i),’style’:‘color’:colors[‘backgroundSub’]}}
for i in range(2015,2020)},value = [2015,2019])],style = {‘width’:’20%’,’padding-left’:’80px’}),
html.Div(html.P(id=’rangeslider-graph-break’)),
html.Div([html.Div(dcc.Graph(id = ‘geoscatterplot-graph’),
style={‘width’:’70%’,’float’:’left’}),
html.Div(html.Pre(id = ‘json-data’),style={‘width’:’10%’,’float’:’right’,’padding-top’:’100px’})],style={‘display’:’flex’,’justify-content’:’space-between’}),],style={‘height’:’70%’})

The front end for the web application would have 4 major components. A header component, range slider component, graph component and output display component. Header component as the name says, would contain the title of the dash board. Range slider allows one to pick the range of years, for which the scatters are to be plotted. Graph component holds the geo scatter plot. And finally, if any selection is made in the scatter, we need to display the details of the We have defined HTML containers to hold these components. In between these major components, we need to provide HTML paragraph component. This is to allocate space before and after each major components. After setting the front end user interface, we have to provide callback decorators to include interactivity among the components. First let’s build the callback for the geo scatter plot.

@app.callback(Output(‘geoscatterplot-graph’,’figure’),[Input(‘range-slider’,’value’)])
def chloroplethGraphPlot(value):
data = []
year = [i for i in range(value[0],value[1]+1)]
print(‘year::’,year)
for i,mob in enumerate(df[‘Mobile’].unique()):
data.append(go.Scattermapbox(
lat=df[(df.Mobile==mob) & (df.Year.isin(year))][‘Latitude’],
lon=df[(df.Mobile==mob) & (df.Year.isin(year))][‘Longitude’],
mode=’markers’,
marker=go.scattermapbox.Marker(
opacity=0.8,size=4,color=rgb(r[i],g[i],b[i])
),
text=df[df[‘Mobile’]==mob][‘Mobile’],name = mob
))
layout = go.Layout(
#title = ‘Dummy Chloropeth’,
autosize=True,
hovermode=’closest’,
height=900,
width=1400,
mapbox=go.layout.Mapbox(
accesstoken=mapbox_access_token,
bearing=0,
center=go.layout.mapbox.Center(
lat=9.934200203046808,
lon=76.2551855659548),
pitch=60,
#style=shaz13_custom_style,
zoom=15
))
return{‘data’:data,’layout’:layout}

mapbox_access_token as we have explained earlier, needs to be generated from mapbox account. The token value needs to be either hard-coded or provide as a global variable.

There is an output parameter and an input parameter in the callback decorator. Geo scatter plot would be the output, as we discussed. Range slider would be the input. The minimum value and maximum value (from the range slider)are passed as value, to the function. Based on these values,list of data traces are created. The width, height and other layout of the graph is set in figure layout. Once the data traces and layout is ready, return them.

Similarly a new decorator needs to be created to define the interactivity between geo scatter plot and the output display.

@app.callback(Output(‘json-data’,’children’),[Input(‘geoscatterplot-graph’,’selectedData’)])
def selectedDataPlot(selectedData):
try:
SelectedMobile = []
for i in range(len(selectedData[‘points’])):
SelectedMobile.append(selectedData[‘points’][i][‘text’])
print(Counter(SelectedMobile)
return json.dumps(Counter(SelectedMobile),indent=2)
except:
print(‘NoneType Error’)
return

The output display component is the output parameter for the callback decorator. Input would be the selected area from the geo scatter plot. Here we are counting the number of smart phones under various smart phone companies.

We have inculcated necessary inter activities as discussed. Lets start the dash app server.

if __name__=='__main__':
app.run_server(host='0.0.0.0',port=80)

Now check the visualisation application at http://0.0.0.0:80.To summarise, we have implemented inter activity based on range slider selection, which looks like below:

Left: 2015–2019, Right:2015:2016

Similarly, we have also implemented the interactivity based on lasso/box selection from the graph. The inter activity looks as below:

Left: Box selection, Right: Lasso selection

There are multiple other simple components and interactivity which could be added to the application. For instance, instead of displaying the dictionary data, we could plot a pie chart which would be more legible and appealing. Also, one could try out with date range picker instead of plain range picker.

--

--