Interactive Map-based Visualization using Plotly

Published in

Tech@Carnot

7 min readAug 11, 2020

Introduction (what we’ll create):

Plotly is the library that has set the benchmark for interactivity for all the available map-visualization python libraries. It is based on the JavaScript library D3.js. There’s hardly anything you can’t do with Plotly. Display data on hover, zoom into the map, pan the map, add buttons and sliders, create live animations, and the list goes on.

This tutorial will introduce Plotly. We will make a Choropleth visualization in this tutorial, like the one shown below. However, Plotly can be used equally well for creating scatter visualizations. We will cover scatter visualizations using Plotly in the ‘Plotly + Mapbox’ tutorial and the ‘Plotly + Datashader’ tutorial.

Structure of the tutorial:

The tutorial is structured into the following sections:

Pre-requisites
Installing Plotly
Converting shapefile to GeoJSON

- Method 1: Using OGR

- Method 2: Using GeoPandas
Getting started with the tutorial
When to use this library
References

Pre-requisites:

This tutorial assumes that you are familiar with python and that you have python downloaded and installed in your machine. If you are not familiar with python but have some experience of programming in some other languages, you may still be able to follow this post, depending on your proficiency.

It is recommended, but not necessary, that you go through the GeoPandas tutorial to get an overall idea of shapefiles.

Installing Plotly:

If you are using Anaconda, you can run:

conda install -c plotly plotly=4.8.1

Otherwise, you can try the pip installer:

pip install plotly==4.8.1

For more information related to the installation, you can see https://plotly.com/python/getting-started/

Converting shapefile to GeoJSON:

Unlike GeoPandas, plotly doesn’t read shapefiles. Instead, it requires a GeoJSON file for reading the geometry and attributes of the relevant shapes. If you have a GeoJSON file available, you can skip this section.

If you don’t have a GeoJSON file directly, you can convert a shapefile to GeoJSON.

We will discuss two methods of converting a shapefile to GeoJSON.

Method 1: Using OGR

This is the method that has been shown in the notebook. It makes use of the Geospatial Data Abstraction Library (GDAL). Actually, to be more specific, it makes use of the OGR library, which comes along with GDAL, to perform manipulations on geospatial vector data. For more information on GDAL, see https://pypi.org/project/GDAL/.

Once you have GDAL installed, import ogr and call the ESRI Shapefile driver.

import ogr#We used compressed shapefiles obtained from mapshaper.org
driver = ogr.GetDriverByName('ESRI Shapefile')shp_path = 'shape_files\\India_States_2020_compressed\\India_states.shp'data_source = driver.Open(shp_path, 0)

Here, you can see that we are using compressed shapefiles. This is because the visualizations generated by plotly tend to be very heavy, and their size is directly proportional to the size of the geospatial data. So if our shapefile size gets reduced by 50%, the visualization size also gets reduced by approximately the same proportion. So we took our original shapefiles and performed compression on them using the online tool https://mapshaper.org/. To know more about how to perform the conversions using mapshaper, click here. The compressed shapefile sizes are approximately 10% of the original.

Once we have the shapefiles opened, we extract the individual features (including geometry and attributes) and store them in the correct JSON format (this is done directly by the ExportToJson method). Once that is done, we dump the JSON file to local storage.

fc = {
    'type': 'FeatureCollection',
    'features': []
    }lyr = data_source.GetLayer(0)
for feature in lyr:    
    fc['features'].append(feature.ExportToJson(as_object=True))with open('json_files\\India_States_2020_compressed.json', 'w') as f:
    json.dump(fc, f)

Method 2: Using GeoPandas

This method is much more straightforward.

import geopandas as gpd# set the filepath and load
fp = "shape_files\\India_States_2020_compressed\\India_States.shp"#reading the file stored in variable fp
map_df = gpd.read_file(fp)#Export it as GeoJSON
map_df.to_file("json_files\\India_States_2020_compressed_gpd.json", driver='GeoJSON')

You just need to read the shapefile using GeoPandas and then export it to GeoJSON using a single line, by specifying the driver as ‘GeoJSON’. However, the GeoJSON files created with this method tend to be slightly larger in size. For instance, for the same shapefile, the GeoJSON file created using OGR was 800 KB large while the one created using GeoPandas was 900 KB large. That’s about a 12.5% higher size using GeoPandas.

Getting started with the tutorial:

GitHub repo: https://github.com/carnot-technologies/MapVisualizations

Relevant notebook: PlotlyChoroplethDemo.ipynb

View notebook on NBViewer: Click Here

Importing relevant packages:

import numpy as np
import pandas as pd
import plotly.express as px
import json
import ogr
# import geopandas as gpd

As can be seen from the import packages, we will be using plotly.express to create the visualization.

Understanding the GeoJSON file:

Now that we have the GeoJSON file, let’s open it.

with open('json_files\\India_States_2020_compressed.json') as f:
  India_states = json.load(f)

You will see that the properties key for each feature holds the name for that feature, which, in our case, are the various states.

India_states["features"][0]['properties']>> {'dtname': 'North  & Middle Andaman',
 'stcode11': '35',
 'dtcode11': '639',
 'year_stat': '2011_c',
 'Dist_LGD': 632,
 'State_LGD': 35,
 'JID': 178,
 'state_name': 'ANDAMAN & NICOBAR',
 'FID': 0}

Let us dig deeper into the geometry and have a look at one lat-lon pair:

India_states["features"][0]['geometry']['coordinates'][0][0][0]>> [93.7, 7.22]

As you can see, we have coordinates accurate up to 2 decimal places, or about 1.1 km, which is more than sufficient for us.

Loading the data and creating the visualization:

We will use the state_dummy_data_no_null.csv file present in the data folder. We will revisit the ‘no-null’ part shortly.

#Load the csv file and check its contents. 
#Make sure that there is one entry for each state in the geojson
 
df = pd.read_csv('data/state_dummy_data_no_null.csv')
df.head()

Now, with plotly express, generating the visualization is boils down to just a couple of lines of code.

max_value = df['count'].max()
fig = px.choropleth(df, geojson=India_states, locations='st_nm',       
                           color='count',
                           color_continuous_scale="Viridis",
                           range_color=(0, max_value),
                           featureidkey="properties.state_name",
                           projection="mercator"
                          )fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Here, we are specifying df as the dataframe of interest. India_states is the GeoJSON file. st_nm column in df is the relevant column containing the names of the states for which we have data. We want the coloring of each shape to happen according to the count column in df, with the range upper-bound at the max value of count Now, in the GeoJSON, the property which contains the names corresponding to the locations field is state_name . Finally the map projection will be mercator . You can get the list of map projections along with their interpretations on Wikipedia.

Now, we want the visualization to be limited only to the locations of interest, and not span the lat-lon range of the entire world. That is achieved by setting the fitbounds= “locations” in the update_geos method. To get the list of all arguments of the update_geos method, click here. Finally, we set all margins to zero and display our visualization. To get the list of all arguments of the update_layout method, click here. Do explore the different arguments of the update_geos and the update_layout methods. There’s a lot you can do with them. Congratulations on your first interactive visualization. Take your time and play around with it!

Please note that plotly express is a higher-level library. If you need a lower level library with enhanced control, you can switch to plotly graph objects. See the references for more information.

Now, let us visit the no_null part. We have explicitly made sure that data corresponding to each shape in the GeoJSON is present in the CSV. Where there was no data present, we have added a 0 for the count. Why? Because plotly renders only those shapes for which some data is present. Let us plot only the first 25 rows of the dataframe, taking df.head(25) . The resulting visualization looks like this:

Now, you certainly don’t want this kind of visualization, especially when there is no background to provide context. We will, however, make good use of the null shapes, when we add a background to the visualization using Mapbox. Till then, you better include data for all shapes. To get a list of all the shape names in the GeoJSON, you can learn the following loop:

for i in range(0, len(India_states["features"])):
    print(India_states["features"][i]["properties"]["state_name"])

This will also help you identify any spelling and case differences between the shape names in your data and the shape names in the GeoJSON.

Saving the visualization:

Once you are ready with your visualization, you can export it as a standalone HTML file, to share with your friends without sharing the source code, or to include in your website.

fig.write_html('html_files\\plotly_choropleth_demo.html')

You can also push these visualizations to Chart Studio and then get a direct embed link for your blog or website. We’ll discuss that in the Plotly with Mapbox tutorial.

When to use this library:

This library is incredibly powerful. But its power comes at a cost: file size. If you need an interactive visualization, just use this library without a thought. However, if you are going to add the visualization in a PDF report, or in a static presentation, you can still use plotly and download the visualization as a PNG, but it will consume higher resources compared to a library like GeoPandas.

References:

Plotly Choropleth Maps in Python: https://plotly.com/python/choropleth-maps/

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on report@carnot.co.in