Use Python geopandas to make a US map with Alaska and Hawaii

Alex Rich, PhD
10 min readJan 16, 2023

--

Here’s one way to make a static choropleth map using geopandas and matplotlib.

Turn tabular data on household food insecurity into a static choropleth map

Why:

I run into a lot of projects where geospatial data visualization adds to the quality of the work. It’s useful in data cleaning, exploratory analysis, and explanatory data visualization. It’s often important to be able to do this kind of work on my own without sending it out to the GIS team or using any license-based software. Maintaining these skills in open source software keeps me from dependence on a present or future employer’s choice of proprietary software solutions. Even federal government data teams are now catching up to industry and beginning to use R and Python where they once were dependent on expensive licenses to SAS or Stata.

I like to work in Python because of the flexibility it provides, the large developer community, and the error handling built into the language. With that in mind, there are a lot of Python packages that offer relatively easy mapping capabilities. I recommend checking out plotly, leaflet, and keplergl. For my purposes today, geopandas and matplotlib offer the level of customization that I’m looking for.

For this specific project, I wanted to explore visual communication about the kind of economic data you don’t hear on the news every day. The Dow Jones Industrial Average is a very limited measure of a country’s economic health. The commitment of a nation to supplying adequate food to its most vulnerable citizens should probably get more attention than it does.

This exercise takes tabular data:

household food insecurity data downloaded from the US Dept of Agriculture

…and turns it into something a little more engaging.

What:

In this article, I’ll walk through how to use Python to download a data file on household food insecurity from the USDA website, obtain geospatial data called a shapefile from the US census website, and merge them together using a few simple libraries to produce a static choropleth map.

How:

My full code for this exercise can be accessed on github. If you’re new to Python and want to follow through the code in a Jupyter Notebook click here.

I’ll use several Python libraries to work through this example



import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import matplotlib.colors as mcolors
import seaborn as sns
import geopandas as gpd
from shapely.geometry import Polygon
import missingno as msno
import os
import wget
import openpyxl
import math

US household food insecurity rates by state can be downloaded from the USDA with wget()


# download the data excel file from the USDA website to local folder
filename = wget.download("https://www.ers.usda.gov/media/rbmpu1zi/mapdata2021.xlsx")
# load the excel file into a pandas dataframe & skip header rows
df = pd.read_excel(os.getcwd()+'/mapdata2021.xlsx',skiprows=4)
# rename the columns of interest
df = df.rename(columns={'Unnamed: 0':'state',
'Percent':'pct_food_insecure'})
# retain only the columns of interest
df = df[['state','pct_food_insecure']]

# take a look at the dataframe for missing values
msno.matrix(df)

The data file contains some rows we’re not interested in.

I’m going to retain only the rows with 2-letter state abbreviations in the “state” column and take a look at the distribution of food insecurity data.

df = df[df.state.str.len()==2]
df.pct_food_insecure.hist()

That all looks reasonable. Now it’s time to get the geospatial files.

The geopandas library works with shapefiles. Shapefiles are a standardized format for geospatial data and can be pulled from a number of sources including the US Census website. Here, I’ll pull a shapefile of US state outlines using wget().

wget.download("https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip")

On a Mac, I unzip the resulting download by double clicking on it. Windows may require you to use a file utility to do so. Once unzipped, the shapefile can be loaded with geopandas by referencing the shapefile folder (geopandas knows where to find everything inside this standardized format).

gdf = gpd.read_file(os.getcwd()+'/cb_2018_us_state_500k')
gdf.head()

The geopandas dataframe that results contains a column of all the geometries we need, a column of state abbreviations, and a wealth of other data we don’t need.

Geopandas makes it easy to merge data. I’ll merge our USDA data with the gdf on the columns in each that correspond to state abbreviation.

gdf = gdf.merge(df,left_on='STUSPS',right_on='state')

When we plot what we have, things look a little wonky.

gdf.plot()

As you can see, plotting Alaska’s western islands causes us to display a great deal more longitudinal territory than we’d like. I’m going to clip both Alaska and Hawaii down to their larger land masses and present them below the continental US. To do that, I’ll use the shapely package’s Polygon() function to define a box around what I want to keep and the geopandas clip() function to retain only the parts of each state that fall inside the polygon.

Here’s a quick illustration of how that works:

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))

# NOTE: the convention for polygon points is (Long, Lat)....counterintuitive
polygon = Polygon([(-175,50),(-175,72),(-140, 72),(-140,50)])
# polygon = Polygon([(-180,0),(-180,90),(-120,90),(-120,0)])

# polygon=hipolygon
poly_gdf = gpd.GeoDataFrame( geometry=[polygon], crs=world.crs)

fig, ax1 = plt.subplots(1, figsize=(8, 18))
world.plot(ax=ax1)
poly_gdf.boundary.plot(ax = ax1, color="red")
ax1.set_title("The red polygon can be used to clip Alaska's western islands", fontsize=20)
ax1.set_axis_off()
plt.show()

And now to demonstrate how to clip Alaska:

polygon = Polygon([(-170,50),(-170,72),(-140, 72),(-140,50)])
# apply1(alaska_gdf,0,36)
alaska_gdf.clip(polygon).plot( color='lightblue', linewidth=0.8, edgecolor='0.8')

I’ll do the same with Hawaii in the final code block.

Merging the Geopandas Dataframe with USDA Food Insecurity Data

Geopandas makes it easy to merge data. I’ll merge on the columns in each dataframe that correspond to state abbreviations.

gdf = gdf.merge(df,left_on='STUSPS',right_on='state')A Brief Intro to Projections:

A Brief Intro to Projections:

Because the earth is round, projecting geography onto 2 dimensional maps requires some fancy math. Fortunately, cartographers have helped take a lot of the pain out of this process.

Coordinate Reference Systems (CRS) help project lattitude and longitude coordinates onto 2-dimensional maps. There are many different systems and each correspond to a different method of projection.

Geopandas makes it easy to experiment with different CRSs https://geopandas.org/en/stable/docs/user_guide/projections.html

The European Petrolium Survey Group (EPSG)(https://epsg.org/home.html) maintains an extensive set of projections for cartographic applications. The best projection for use in a particular visualization depends on the part of the world being visualized and the preferences of the developer.

A helpful tool for choosing a CRS can be found here: https://epsg.io/

CRS selection can cause some land masses to look larger or smaller than they are relative to others. There’s always going to be a bit of distortion in a map. Your job is to manage that.

Here’s a good look at why this matters from Open News: https://source.opennews.org/articles/choosing-right-map-projection/

Open News illustration of the appearances of different CRS projections of CONUS

We can re-project coordinates for any of the components of our map using the geopandas command .to_crs()

gdf.to_crs({'init':'epsg:2163'})

Managing Alaska and Hawaii

I’m going to create new boxes to map them in underneath and to the left of the continental US. Confusingly, these boxes are referred to as “axes” in matplotlib.



# Create a "copy" of gdf for re-projecting
visframe = gdf.to_crs({'init':'epsg:2163'})

# create figure and axes for with Matplotlib for main map
fig, ax = plt.subplots(1, figsize=(18, 14))
# remove the axis box from the main map
ax.axis('off')


# create map of all states except AK and HI in the main map axis
visframe[~visframe.state.isin(['HI','AK'])].plot(color='lightblue', linewidth=0.8, ax=ax, edgecolor='0.8')


# Add Alaska Axis (x, y, width, height)
akax = fig.add_axes([0.1, 0.17, 0.17, 0.16])


# Add Hawaii Axis(x, y, width, height)
hiax = fig.add_axes([.28, 0.20, 0.1, 0.1])

# We'll later map Alaska in "akax" and Hawaii in "hiax"
Map with axes established for Alaska and Hawaii below the continental US

Bringing it All Together

I’m going to use a matplotlib continuous color scale (https://matplotlib.org/stable/tutorials/colors/colormaps.html) to set the color for each state. Because I’m treating Alaska and Hawaii separately from CONUS, I’m going to create a column of the color each state should be and then map each state separately with the color pulled from that column.

# Apply this to the gdf to ensure all states are assigned colors by the same func
def makeColorColumn(gdf,variable,vmin,vmax):
# apply a function to a column to create a new column of assigned colors & return full frame
norm = mcolors.Normalize(vmin=vmin, vmax=vmax, clip=True)
mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.YlOrBr)
gdf['value_determined_color'] = gdf[variable].apply(lambda x: mcolors.to_hex(mapper.to_rgba(x)))
return gdf

Now it’s time to plot everything

# **************************
# set the value column that will be visualised
variable = 'pct_food_insecure'

# make a column for value_determined_color in gdf
# set the range for the choropleth values with the upper bound the rounded up maximum value
vmin, vmax = gdf.pct_food_insecure.min(), gdf.pct_food_insecure.max() #math.ceil(gdf.pct_food_insecure.max())
# Choose the continuous colorscale "YlOrBr" from https://matplotlib.org/stable/tutorials/colors/colormaps.html
colormap = "YlOrBr"
gdf = makeColorColumn(gdf,variable,vmin,vmax)

# create "visframe" as a re-projected gdf using EPSG 2163 for CONUS
visframe = gdf.to_crs({'init':'epsg:2163'})



# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(18, 14))
# remove the axis box around the vis
ax.axis('off')

# set the font for the visualization to Helvetica
hfont = {'fontname':'Helvetica'}

# add a title and annotation
ax.set_title('Food Insecurity by Percentage of State Households\n2019-2021', **hfont, fontdict={'fontsize': '42', 'fontweight' : '1'})

# Create colorbar legend
fig = ax.get_figure()
# add colorbar axes to the figure
# This will take some iterating to get it where you want it [l,b,w,h] right
# l:left, b:bottom, w:width, h:height; in normalized unit (0-1)
cbax = fig.add_axes([0.89, 0.21, 0.03, 0.31])

cbax.set_title('Percentage of state households\nexperiencing food insecurity\n', **hfont, fontdict={'fontsize': '15', 'fontweight' : '0'})

# add color scale
sm = plt.cm.ScalarMappable(cmap=colormap, \
norm=plt.Normalize(vmin=vmin, vmax=vmax))
# reformat tick labels on legend
sm._A = []
comma_fmt = FuncFormatter(lambda x, p: format(x/100, '.0%'))
fig.colorbar(sm, cax=cbax, format=comma_fmt)
tick_font_size = 16
cbax.tick_params(labelsize=tick_font_size)
# annotate the data source, date of access, and hyperlink
ax.annotate("Data: USDA Economic Research Service, accessed 15 Jan 23\nhttps://www.ers.usda.gov/topics/food-nutrition-assistance/food-security-in-the-u-s/key-statistics-graphics/#map", xy=(0.22, .085), xycoords='figure fraction', fontsize=14, color='#555555')


# create map
# Note: we're going state by state here because of unusual coloring behavior when trying to plot the entire dataframe using the "value_determined_color" column
for row in visframe.itertuples():
if row.state not in ['AK','HI']:
vf = visframe[visframe.state==row.state]
c = gdf[gdf.state==row.state][0:1].value_determined_color.item()
vf.plot(color=c, linewidth=0.8, ax=ax, edgecolor='0.8')



# add Alaska
akax = fig.add_axes([0.1, 0.17, 0.2, 0.19])
akax.axis('off')
# polygon to clip western islands
polygon = Polygon([(-170,50),(-170,72),(-140, 72),(-140,50)])
alaska_gdf = gdf[gdf.state=='AK']
alaska_gdf.clip(polygon).plot(color=gdf[gdf.state=='AK'].value_determined_color, linewidth=0.8,ax=akax, edgecolor='0.8')


# add Hawaii
hiax = fig.add_axes([.28, 0.20, 0.1, 0.1])
hiax.axis('off')
# polygon to clip western islands
hipolygon = Polygon([(-160,0),(-160,90),(-120,90),(-120,0)])
hawaii_gdf = gdf[gdf.state=='HI']
hawaii_gdf.clip(hipolygon).plot(column=variable, color=hawaii_gdf['value_determined_color'], linewidth=0.8,ax=hiax, edgecolor='0.8')



fig.savefig(os.getcwd()+'/food_insecurity_by_state_2019_2021.png',dpi=400, bbox_inches="tight")
# bbox_inches="tight" keeps the vis from getting cut off at the edges in the saved png
# dip is "dots per inch" and controls image quality. Many scientific journals have specifications for this
# https://stackoverflow.com/questions/16183462/saving-images-in-python-at-a-very-high-quality
The chhoropleth results of the code with annotation for data and a continuous color bar legend

Checking the Results:

The map looks pretty good, but I’ve had issues in splitting Alaska and Hawaii off with continuous color scales before. Did the coloration go as expected?

Let’s choose a state with similar food insecurity to that of Alaska and make sure they are the same color.

# Make a horizontal bar chart of state food insecurity rates using Seaborn
visFrame = gdf.sort_values('pct_food_insecure',ascending=False)[:100]


#Horizontal barchart
#Create subplot
sns.set_style('whitegrid') #set theme
fig,ax=plt.subplots(figsize=(16,30))
#Create barplot
################################################################ ***** "palette" for colors...
chart1 = sns.barplot(x=visFrame['pct_food_insecure'],y=visFrame['state'],color='lightblue')
chart1.set_title('Percent of population food insecure by sate\n',weight='bold',fontsize=16)

chart1.set_xlabel('Percentage',weight='bold',fontsize=13)
chart1.set_ylabel('State', weight='bold',fontsize=13)
#Value number on chart: from "max" on stackoverflow https://stackoverflow.com/questions/49820549/labeling-horizontal-barplot-with-values-in-seaborn
for p in ax.patches:
width = p.get_width() # get bar length
ax.text(width + .5, # set the text at .5 unit right of the bar
p.get_y() + p.get_height() / 2, # get Y coordinate + X coordinate / 2
'{:}%'.format(round(width,1)), # set variable to display
fontsize=15,
ha = 'left', # horizontal alignment
va = 'center') # vertical alignment

Maine and Alaska are both 9.5%…. are they the same color on the map?

Great! We’ve successfully made a choropleth of USDA data with a continuous color scale and both Alaska and Hawaii projected underneath the continental US.

My full code for this exercise can be accessed on github. If you’re new to Python and want to follow through the code in a Jupyter Notebook click here.

Acknowledgements

This notebook is the product of a lot of time reading documentation, googling, and exploring. I haven’t been able to record all of the articles, posts, and stack overflow handles that provided useful inputs or perspectives.

Abdishakur on Medium https://medium.com/@shakasom has been a big influence on my work. I highly recommend following him.

I got my start in Python programming almost a decade ago when I finally couldn’t stand manualy scheduling all of the flights for my Air Force squadron anymore. Sentdex’s youtube channel made Python acccessible, practical, and fun for me when I new just about nothing about coding (https://www.youtube.com/@sentdex). Harrison was among the most influential educators in my life, though I’ve never met him or spoken with him.

Both US Census and USDA make an amazing amount of data publicly available and have created an environment where open source tools can thrive. Both websites are worth a deep dive when you have the time.

Questions/Comments/Feedback?

alexrich@duck.com

--

--

Alex Rich, PhD

Data scientiest, startup founder, crash investigator, and passionate advocate of antifragile system design