Time-lapse Choropleth Map-visualization using GeoPandas

Published in

Tech@Carnot

7 min readAug 11, 2020

Introduction (what we’ll create):

GeoPandas is what you’ll be using for creating static non-interactive choropleth maps for any region of your choice, as long as you have the shape-file for that region. You can get an idea of what all is possible with GeoPandas from the video below. We will be making this video in this tutorial.

Structure of the tutorial:

The tutorial is structured into the following sections:

Pre-requisites
Installing GeoPandas
All about Shapefiles
Getting started with the tutorial
When to use this library
References

Pre-requisites:

This tutorial assumes that you are familiar with python and that you have python downloaded and installed in your machine. If you are not familiar with python but have some experience of programming in some other languages, you may still be able to follow this post, depending on your proficiency.

Installing GeoPandas:

If you are using Anaconda,

conda install geopandas

If you aren’t using Anaconda, you need to ensure that the required dependencies are installed, before using the pip installer. The following dependencies are required by GeoPandas:

numpy
pandas (version 0.23.4 or later)
shapely (interface to GEOS)
fiona (interface to GDAL)
pyproj (interface to PROJ; version 2.2.0 or later)
six

Once you have these dependencies, you can go ahead with the pip install

pip install geopandas

For more information related to installation, refer to https://geopandas.org/install.html

If you are a Windows user, pip install is not straightforward. Please see How to install geopandas on Windows.

All about Shapefiles:

A shapefile, as the name suggests, stores the information that can describe a shape. The shape can be a point, a line, or a polygon. The shapefiles that we will use in this series of tutorials will represent the geographical boundaries of states/districts of India. There can be more detailed shapefiles containing, taluka or ward level shapes, or more broad shapefiles, containing country or continent level shapes.

Downloading shapefiles:

There are several sources from where you can get the desired shapefiles. Some of them are listed below:

For this and other tutorials requiring shapefiles, we have provided the state and district level shapefiles for India (as per the latest map of India, 2020) in the shape_files folder in the helper repo.

Files constituting the shapefiles folder:

When you download a shapefile from any source, you will see that there are a lot of files with different extensions in the downloaded folder. Of these, 3 are mandatory and others are optional. The mandatory file extensions are:

.shp file — contains the feature geometries (coordinates describing the shape)
.dbf file — contains additional attributes of the shape (like name, type, etc.) in dBase format
.shx file — contains index of the feature geometry, allowing a processor to search forward and backward quickly

You may find several other files present in the downloaded folder. You can get a detailed interpretation of all the extensions here. We will only require the mandatory .shp, .dbf and the .shx files.

Getting started with the tutorial

GitHub repo: https://github.com/carnot-technologies/MapVisualizations

Relevant notebook: GeoPandasDemo.ipynb

View notebook on NBViewer: Click Here

Importing relevant packages:

import numpy as np
import pandas as pd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import geopandas as gpd

Opening and Visualizing the Shapefiles:

# set the filepath
fp = "shape_files\\India_Districts_2020\\India_Districts.shp"#read the file stored in variable fp
map_df = gpd.read_file(fp)# check data type so we can see that this is a GEOdataframe
map_df.head()

As you can see, this dataframe contains the geometries of the various shapes as well as the various attributes, like district and state names. Here you can try a few experiments to understand why all the three mandatory extensions are required. If you delete the .shx files from the folder, python will throw up an error. If you delete the .dbf file from the folder, python will load a dataframe containing just the shape polygons, and no attribute information. You will not be able to make out which shape belongs to which district.

The head of the geopandas dataframe without the .dbf file

Now, we are interested only in the districts of the Uttar Pradesh (UP) state. Therefore, we will filter the geopandas dataframe, just like we filter a normal dataframe

#Isolate the UP districts
map_df_up = map_df[map_df['stname'] == 'UTTAR PRADESH']#Check the resulting UP Plot
map_df_up.plot()

As you can see, from the output plot, we’ve isolated the UP districts. This is a really cool feature of geopandas, you can plot all the shapes contained in the dataframe just by calling .plot().

Loading the data and making sense of it:

The UP_dummy_data.csv in the ‘data’ folder contains the relevant data for this tutorial.

#Get the data CSV file
df = pd.read_csv('data\\UP_dummy_data.csv')
df.head()

As you can see, it has some dummy data related to tractor installations. It contains the tractor model, installation date as well as the installation district. By seeing the video preview of what we are going to create, you would’ve realized that we are interested only in the installation date and district.

Creating a stand-alone visualization:

For now, let’s sideline the installation date field and only look at the aggregate count for each district and try to visualize it.

#Get district wise installation count
df_district = df['installation_district'].value_counts().to_frame()
df_district.reset_index(inplace=True)
df_district.columns = ['district','count']
df_district.head()

Let’s merge this dataframe with the geopandas dataframe

#Merge the districts df with the geopandas df
merged = map_df_up.set_index('dtname').join(df_district.set_index('district'))
merged.head()

Here, we are setting the district columns as the index in both the geopandas as well as the data dataframes, so that we can get the count of installations against each district in the merged dataframe.

Now let’s just fill the NaN values and generate the plot.

#Fill NA values
merged['count'].fillna(0,inplace=True)
#Get max count
max_installs = merged['count'].max()#Generate the choropleth map
fig, ax = plt.subplots(1, figsize=(20, 12))
merged.plot(column='count', cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')# remove the axis
ax.axis('off')# add a title
ax.set_title('District-wise Dummy Data', fontdict={'fontsize': '25', 'fontweight' : '3'})# Create colorbar as a legend
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=0, vmax=max_installs))# add the colorbar to the figure
cbar = fig.colorbar(sm)

There we go. We have our first geopandas visualization ready!

Here, by setting column = ‘count’ in the .plot() arguments, we told geopandas to use the ‘count’ column to decide the color of each individual district. We used the ‘Blues’ colormap for this visualization. You can use the color map of your choice out of the several available options listed here.

Creating date-wise images:

First, we’ll modify the ‘Installed On’ field to remove the time and convert the resulting column type to datetime.

df['Installed On'] = df['Installed On'].apply(lambda x: x.split('T')[0])
df['Installed On'] = pd.to_datetime(df['Installed On'],format="%Y-%m-%d")

Now, we will essentially generate progressively increasing ‘slices’ of the dataframe we used for the standalone visualization. If the standalone visualization contains data for 110 days, our first visualization will contain data for one day, the second will contains data for two days, and the 110th visualization will contain data for all 110 days. At any point, the visualization will narrate the cumulative story up to that point.

date_min = df['Installed On'].min()
n_days = df['Installed On'].nunique()fig, ax = plt.subplots(1, figsize=(20, 12))for i in range(0,n_days):
    date = date_min+timedelta(days=i)
    
    #Get cumulative df till that date
    df_c = df[df['Installed On'] <= date]
    
    #Generate the temporary df
    df_t = df_c['installation_district'].value_counts().to_frame()
    df_t.reset_index(inplace=True)
    df_t.columns = ['dist','count']
    
    #Get the merged df
    df_m= map_df_up.set_index('dtname').join(df_t.set_index('dist'))
    df_m['count'].fillna(0,inplace=True)
    fig, ax = plt.subplots(1, figsize=(20, 12))
    df_m.plot(column='count',
                cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')
    
    # remove the axis
    ax.axis('off')
    # add a title
    ax.set_title('District-wise Dummy Data', 
                 fontdict={'fontsize': '25', 'fontweight' : '3'})
    
    # Create colorbar as a legend
    sm = plt.cm.ScalarMappable(cmap='Blues', 
            norm=plt.Normalize(vmin=0, vmax=df_t['count'].iloc[0]))
    # add the colorbar to the figure
    cbar = fig.colorbar(sm)
    fontsize = 36
    
    # Positions for the date
    date_x = 82
    date_y = 29    ax.text(date_x, date_y, 
            f"{date.strftime('%b %d, %Y')}", 
            color='black',
            fontsize=fontsize)    fig.savefig(f"frames_gpd/frame_{i:03d}.png", 
                dpi=100, bbox_inches='tight')
    plt.close()

Stitching the images to form a video:

Just like in the Cartopy tutorial, we will now stitch the images to form the time-lapse video. For that open your terminal or command prompt, navigate to the frames_gpd folder and run the following command:

ffmpeg -framerate 5 -i frame_%3d.png -c:v h264 -r 30 -s 1920x1080 ./district_video.mp4

Our frame rate is 5 frames per second. Our files are named as frame_001.png, frame_002.png, and so on. Therefore, we are asking ffmpeg to look for 3 digits after frame_ by using the command frame_%3d.png. The resolution is specified as 1920x1080 and the name of the video is district_video.mp4.

To know more about the different options related to video-encoding using ffmpeg, you can visit https://trac.ffmpeg.org/wiki/Slideshow

When to use this library:

Just like Cartopy, this library should be used when creating static, non-interactive choropleth maps for use in presentations or for hosting on the website. However, if you wish to have an interactive visualization with the ability to zoom, pan, hover, and click, you need to switch to a library like plotly. Also, geopandas comes in handy when you quickly want to visualize your shapefile before moving on to the actual visualization.

References:

Official GeoPandas documentation: https://geopandas.org/

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on report@carnot.co.in