TopoJSON-based Choropleth Visualization using Folium

Yash Sanghvi
Tech@Carnot
Published in
6 min readAug 13, 2020

--

Introduction (what we’ll create):

As the title suggests, we will be creating a Choropleth visualization using a TopoJSON file in this tutorial. We will use Folium for that purpose. Folium is a library based on Leaflet.js (unlike Plotly, which is based on D3.js). The following image gives an idea of what we will be creating at the end of this tutorial:

This is very similar to the visualization created in the Plotly tutorial. The same data set has been used for creating both the Plotly and the Folium visualizations. However, this visualization has been created using a TopoJSON file. The TopoJSON file, in turn, was created from the same GeoJSON that was used for the Plotly tutorial, using the conversion available on mapshaper.org. As you can see, while the GeoJSON file is 800 KB large, the corresponding TopoJSON is about one third the size, at 269 KB.

The resulting difference can also be seen in the file sizes. The Jupyter Notebook for the Plotly visualization is more than 8 MB in size, while the Folium notebook is just about 1 MB in size. Folium also has a base map incorporated in the visualization (just like in the Plotly + Mapbox tutorial). However, Folium doesn’t have the level of interactivity of Plotly. You can zoom into the map, or pan the map, but hovering on any state will have no effect here, whereas Plotly displayed the count on hover. Here, the count has to be inferred manually using the color bar.

Structure of the tutorial:

The tutorial is structured into the following sections:

  1. Pre-requisites
  2. Installing Folium
  3. TopoJSON vs GeoJSON
  4. Getting started with the tutorial
  5. When to use this library
  6. References

Pre-requisites:

This tutorial assumes that you are familiar with python and that you have python downloaded and installed in your machine. If you are not familiar with python but have some experience of programming in some other languages, you may still be able to follow this tutorial, depending on your proficiency.

Installing Folium:

If you are using Anaconda:

$ conda install folium -c conda-forge

Otherwise,

$ pip install folium

For additional information, see https://python-visualization.github.io/folium/installing.html

TopoJSON vs GeoJSON:

We have already discussed that TopoJSON files have a massive advantage over GeoJSON files when it comes to file-size, without any simplification of the shapes. Let’s first discuss the reason for this, and we will then come to the cost at which we get a size reduction. Quoting from the official TopoJSON GitHub repo, the major reasons why TopoJSON files are much smaller in size than GeoJSON are:

Geometries in TopoJSON files are stitched together from shared line segments called arcs.

What this roughly means is that if two states of India share a common boundary, the TopoJSON will not store the boundary points twice. Rather there will be a single arc that will be shared by both the states.

To further reduce file size, TopoJSON can use quantized delta-encoding for integer coordinates. This is similar to rounding coordinate values (e.g., LilJSON), but with greater efficiency and control over loss of information. And like GeoJSON, TopoJSON files are easily modified in a text editor and amenable to gzip compression.

A layman may just focus on the ‘rounding coordinate values’ part. Essentially, TopoJSON performs some sophisticated rounding operations to reduce the size of the stored coordinates.

Now, while this has substantial advantages, one major drawback of TopoJSON files is a complicated file structure. Reading a GeoJSON is much more intuitive than reading a TopoJSON. Also, several libraries work strictly with GeoJSON files, and therefore, you need to use an external tool to convert TopoJSON to GeoJSON.

Getting started with the tutorial:

GitHub repo: https://github.com/carnot-technologies/MapVisualizations

Relevant notebook: FoliumChoroplethDemo.ipynb

View notebook on NBViewer: Click Here

Import relevant packages:

import numpy as np
import pandas as pd
import folium
import json

Loading and Understanding the TopoJSON file:

First, let’s open the TopoJSON file, which is present in the json_files folder.

with open('json_files\\India_States_2020_compressed_topo.json') as f:
states_topo = json.load(f)

Now, the way to explore any JSON file is to have a look at the keys. As can be seen in the notebook, we have a successive look at the keys to identify the key which contains the state names.

states_topo.keys()
>> dict_keys(['type', 'arcs', 'transform', 'objects'])
states_topo['objects'].keys()
>> dict_keys(['India_States_2020_compressed'])
states_topo['objects']['India_States_2020_compressed'].keys()
>> dict_keys(['type', 'geometries'])

After this point, you might be interested in having a look at the contents of geometries . However, it turns out that geometries is a list. So let’s look at its first element.

states_topo['objects']['India_States_2020_compressed']['geometries'][0].keys()
>> dict_keys(['arcs', 'type', 'properties', 'id'])
states_topo['objects']['India_States_2020_compressed']['geometries'][0]['properties'].keys()
>> dict_keys(['dtname', 'stcode11', 'dtcode11', 'year_stat', 'Dist_LGD', 'State_LGD', 'JID', 'state_name'])

Voila! We have finally isolated the state_name field. Let’s check that it indeed corresponds to the names of Indian states and Union Territories.

states_topo['objects']['India_States_2020_compressed']['geometries'][0]['properties']['state_name']
>> 'ANDAMAN & NICOBAR'

You can play around with the TopoJSON file to see what is contained in the keys that we haven’t explored in this tutorial.

Loading the data and creating the visualization:

We will use the same CSV that we used in the PlotlyChoroplethDemo notebook.

df = pd.read_csv('data/state_dummy_data_no_null.csv')

Now, creating the visualization in Folium is a two-step process. We first create the map and then add the choropleth on top of it.

folium_map = folium.Map(location=[19, 80],
zoom_start=4,
tiles="CartoDB dark_matter")
folium.Choropleth(geo_data=states_topo,
topojson='objects.India_States_2020_compressed',
key_on='feature.properties.state_name',
data=df, # my dataset
columns=['st_nm','count'],
fill_color='GnBu',
fill_opacity=0.7,
line_opacity=0.5).add_to(folium_map)
data=df, # my dataset
columns=['st_nm','count'],
fill_color='GnBu',
fill_opacity=0.7,
line_opacity=0.5).add_to(folium_map)
folium_map

I know there’s a lot to unpack here. So let’s get started. Firstly, when creating the Folium map, we need to provide the coordinates of the centroid and the initial zoom level. Since our region of interest is India, I’ve provided an appropriate centroid location. A zoom level of 4 should be sufficient. I would have used a higher zoom level if my region of interest was an Indian state instead of the whole country. For tiles, I’ve chosen “CartoDB dark_matter”. Just like with Mapbox (discussed in the Plotly + Mapbox Choropleth tutorial), several free tile options are available here, without any API key requirements. They are:

  • “OpenStreetMap”
  • “Mapbox Bright” (Limited levels of zoom for free tiles)
  • “Mapbox Control Room” (Limited levels of zoom for free tiles)
  • “Stamen” (Terrain, Toner, and Watercolor)
  • “CartoDB” (positron and dark_matter)

Coming to the Choropleth visualization, we have indicated the TopoJSON in the geo_data field. Whenever we pass a TopoJson file, we must also pass an object in the topojson field. This is because folium converts the TopoJSON into GeoJSON internally and therefore requires you to tell it which key of objects contains the geometries. We know that the geometries and features are stored in the India_States_2020_compressed key of objects and hence pass that value to the topojson field. Once that is done, we can assume that the internal conversion to GeoJSON will happen properly and then the properties will all be stored under the feature key. Therefore, when passing the argument to the key_on field, we specify the path to the key holding the state names as feature.properties.state_name. The dataframe is passed to the data argument, and the columns argument will contain the column in the df corresponding to the key_on argument in the JSON, and another column containing the value for that state. The other arguments, I presume, are self-explanatory. You can get the list of all arguments, along with their explanations here. Congratulations!! Your visualization is ready!

Finally, we can save it to HTML using

folium_map.save("html_files/folium_demo.html")

When to use this library:

This is your choice of library when you’re dealing with TopoJSON files and you don’t want to perform the TopoJSON to GeoJSON conversion yourself. In fact, the notebook also contains a bonus section showing how to visualize any TopoJSON file without a corresponding dataframe, using Folium. This library can also be used when your interactivity requirements are low. If you want to save an image for a PPT or a report, and require a base map below your choropleth, Folium can help you create the required visualization very quickly. However, if you need a highly interactive visualization, or need to create a time-lapse animation, Folium won’t help. In that case, you need to shift to a library like Plotly.

References:

  1. https://python-visualization.github.io/folium/
  2. https://github.com/topojson/topojson/blob/master/README.md

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on report@carnot.co.in

Follow Tech@Carnot for more such blogs on topics like Data Science and Visualization, Cloud Engineering, Firmware Development, and many more.

--

--