Intro Basemap for mapping values

In this post I want to do something of a simple walkthrough of using the matplotlib toolkit Basemap for creating maps with overlaid data. I’ve been seeking out ways to overlay continuous values onto maps for a while now (like, using color to denote average house price in an area, or to denote different temperatures or other weather data) , and have been surprised at how difficult it is to do. Previously I used Folium, which can handily produce choropleths, that is maps where different districts or states are given different colors to represent some value, but it doesn’t seem to have an easy way to map continuous values for instances when the values you want to represent don’t line up with available geographic groupings. You could make a choropleth with weather data by, say, averaging the temperature across each district to determine the color applied to the district, but that’s not really how temperatures work.

Basemap is an alternative that can create choropleths as well, but also gives you the ability to plot continuous values, with some constraints. I thought it might be useful to present a bit of an introduction to Basemap here.

First step is installing Basemap, as you would any of these packages. One more quirk with using Basemap in a notebook like Jupyter is that you need to ensure that the Jupyter environment is running Basemap. If you’re using Jupyter, go into the environments section of Anaconda Navigator and make sure Basemap is installed:

Making sure Basemap is active in the Jupyter environment

If it isn’t currently installed, go to the pulldown that currently says ‘installed’, go to the not installed table to find it. Once you have it running you can start making maps.

The major advantage of Basemap is its wide range of built in maps and projections. It has a series of different sorts of maps, details for things like rivers and coastlines and different sorts of projections for visualizations that include large fractions of the globe. Many of the maps/projections have different resolutions you can choose from, so, two things to bear in mind: 1) higher resolution borders are frequently computationally intensive, so don’t default to high resolution if you’re making a map of a larger area and 2) not every projection will produce good results at every size. Compare this shaded relief fill of the northeastern US:

An ok map of the eastern seaboard

With the same fill focused on New York City:

A not great map of NYC

Now, to build up a map. I start with a basic outline of New York City, using some of Basemap’s built in drawing functions, for coastlines, counties and states. This code:

Produces this map:

Outline of NYC

This simple map already demonstrates both some of the power of Basemap, along with some of the quirks. In particular, you can see that some of the waterfront is in high detail and some isn’t. A quarter of the way up the west side of Manhattan the Hudson seems to disappear and is replaced with a single gray line: the detailed, lower portion is being drawn by the detailed drawcoastlines, but this ends and is replaced by drawrivers, which simply charts the path of the river with a line.

On top of this base map, you can scatter information that comes with latitudes and longitudes rather than x and y coordinates. For instance, I found a list of subway stations from an MTA data site and mapped then onto this outline by adding just these two lines of code:

Which results in this:

Outline of NYC with subway stations scattered

I noticed once I plotted this that this data is missing half of the subway stations, but since this is simply for demonstration purposes, I didn’t go out hunting to complete the data set.

Now we can get to the interesting part, which is mapping continuous data as a colored overlay of this. You can do this with a function called pcolormesh, which takes in a grid of latitudes/longitudes and associated z values, and properly fills in the space between the points of the grid. For this example, I’ll use pcolormesh to show how far each point is from the nearest subway stop. In order for it to properly interpolate for points in between the grid-points you’ve given it, it needs to be fed a well ordered set of points. To do this you, might have code like this:

In the first block I’ve used numpy’s linspace to create evenly spaced latitude and longitude points that cover the area I’m mapping, and then combined them into a meshgrid. The second block is a function that finds the distance to the nearest subway station and returns it. The last block creates an array for z values which is the same shape as our lat/lon meshgrid and then fills it in using the function. Then we can add pcolormesh to our map code like this:

Which creates a map like this:

Map now with heatmap added

One nice thing about this process is that it allows you control the resolution of your heatmap. In this example you can see that I created a 420 by 420 grid of latitude and longitude points to create this colormesh. I could have increased or decreased the overall resolution of this map by simply changing that number.

One problem with this sort of map is that pcolormesh overlays over a rectangular shape and doesn’t have a way deciding whether or not to put color there: it maps color everywhere. If you have data from an irregularly shaped array or are trying to map data onto an irregular shape, there’s no easy solution. In this case, you can see, for instance, that I’ve mapped data onto the rivers, which doesn’t make a whole lot of sense (unless, I suppose, you’re swimming in the middle of the East River and want to know which way to swim to get to the nearest station on land).

There doesn’t seem to be an easy solution for creating a colormesh for an irregular shape or masking certain parts of the mesh. In this case, a lot of internet searching led me to something of a solution: creating a mask over the water and coloring it white. This code:

Produces a map like this:

Which is an improvement although, honestly, a far cry from what I’d like to be able to make. If anyone wants to recommend a library which can do these tasks better or, better yet, write such a library and send it my way, I’d be appreciative.