Making Colored Country Maps with Real Data Using Matplotlib and Geopandas

Andrew Roman
Analytics Vidhya
Published in
13 min readJul 17, 2021
Choropleth maps of New Zealand made with Matplotlib and New Zealand Police district data

We’re going to expand upon Xiao Wang’s tutorial on how to map police districts in New Zealand using Pandas, Geopandas, and Matplotlib — a great starter if you’re new to creating maps in Python. In this tutorial, you’ll go further and make a map that shows the total number of firearm licenses across New Zealand using data from the New Zealand Police’s website.

These colored maps are known as “choropleth maps (Greek for “many regions”). They’re like heatmaps, except that they show actual states, countries, and so on.

Note: If you want to do the same thing in Plotly, which shows the map in your browser instead of as a lone image, Wang’s tutorial has some good Plotly examples.

What are Pandas, Matplotlib, and Geopandas?

Pandas is a Python library that helps you mess with data sets, like those in CSVs and Excel sheets. If you’ve ever tried to put a CSV into huge 2D array into a Python list or dictionary and have found it impossible to get a single column out of it, Pandas fixes that. For example, you read your CSV into a Pandas “dataframe” then docolumn = dataframe["column"]).

Matplotlib is the most popular way to get graphs and charts out of Pandas, though there are also Plotly, Seaborn, and more.

Geopandas is Pandas with geographical/GIS functions added. Dealing with the coordinates of states and countries would be impossible without it.

TL;DR What are the actual steps?

We’re going to find some coordinate data about New Zealand, find some regular data about New Zealand (e.g. how many people live in each area), combine those data sets, then make a map with the combined data.

The steps break down like this:

  1. Find some GIS data of New Zealand that has the borders of NZ’s districts (We’ll use this data from Kaggle)
  2. Feed the GIS data into Geopandas
  3. Find some data about New Zealand’s police districts (We’ll use police.govt.nz)
  4. Feed that data into regular Pandas
  5. Combine the Geopandas and Pandas dataframes wherever their District columns match
  6. Give the combined dataframe to Matplotlib to make the actual map
  7. Polish the map
  8. Bonus: What does this tell us about New Zealand?

Step 0: Getting Python ready to go

We’ll assume you already know Python and have it installed.

To work with Pandas, Geopandas, and Matplotlib, we need to both download them from the internet then importthem into our code. They don’t come with basic Python. To download them, type the following Pip command into your Terminal (Pip is a tool to download and manage Python libraries that will make your life easier):

pip install pandas geopandas matplotlib

This will install all three at once. Or you can write each one individually on its own pip install <library> line. Installation will take a minute or two.

Installation of Pandas, Geopandas, and Matplotlib on macOS using Terminal. We only have to enter what comes after % on the first line. The rest is the output from the installer.

Step 1: Get GIS Data of New Zealand

2023 Update: Koordinates removed the dataset below. You can download the data from Kaggle here.

We’re going to use the same GIS data from Xiao Wang’s original tutorial, which she found on Koordinates.com. The data set is linked here, or you can go to Koordinates and search for “New Zealand police district.”

This data is in the format of a shapefile, which is a standard for sharing maps and coordinate data among GIS programmers.

Info page for the New Zealand Police District Boundaries data set from Koordinates.com

We’ll have to create a free account with Koordinates. After we have an account, we can get the data by clicking the green Export button. The download will contain a ZIP file, which we can unzip to get the .shx file that has our coordinate data.

Zipped and unzipped files of New Zealand Police District data from Koordinates.com

Let’s move the entire kx-nz-police-district-boundaries-SHP folder to the same folder as our main.py file. That way our script can find our data more easily. I renamed the folder to data to avoid having to type out the long folder name later in our code.

Our project folder with `main.py` and the recently downloaded NZ Police District data from Koordinates.com

Important: Keep all 7 of these nz-police-station-boundaries files together. We’ll only tell our code about the .shx file, but Geopandas needs several of these files to work (and keeping the .txt documentation is just good practice). Otherwise, you’ll get an error like this:

Error caused by the code not being able to find the shapefile

Step 2: Load the coordinates into Geopandas

We’re going to load the coordinates from the nz-police-district-boundaries.shx file into Geopandas in our code.

Create a new Python file, main.py, with the following:

import geopandas as gpdmap_df = gpd.read_file("data/nz-police-district-boundaries.shx")

Two lines was all it took. We imported Geopandas then used the Geopandas read_file() function to load the .shx file. Now, we have a dataframe (specifically a GeoDataFrame) called map_df that holds our coordinate data, which we can manipulate with Geopandas functions and eventually turn into a map.

If we want to see what data we have so far, we can add print(map_df) to the end of main.py. When we run it, we get this:

Contents of the map_df GeoDataFrame, which holds coordinates for the police districts of New Zealand. Source: Koordinates.com

This shows the following columns:

  • Index (0–11) — AKA the row number
  • DISTRICT_I — an index for each district
  • DISTRICT_N — the name of each district
  • geometry — the coordinates of that district’s boundaries. The data is cut off with a ... , but each row in the geometry column actually has 1000 or so coordinates. To see just how many, add print(map_df["geometry"][1]) to main.py and run it again.

Step 3: Find Some Real Data About New Zealand

Wang’s original tutorial had some arbitrary data in a 12-row CSV file that assigned a crime rate to each of the 12 districts — basic integers like 66 or 118 that could really mean anything (e.g. 66 robberies). That tutorial got us a map with 12 colored districts.

New Zealand map created using Xiao Wang’s tutorial on choropleth maps

But what about real data? How do we say something tangible about the police districts in New Zealand? Does data about these districts even exist on the internet? We have to find out for ourselves.

Looking for the right data is not always straight forward, but it is a huge chunk of being a good data scientist.

@sethrosen on Twitter

Let’s start Googling. If we search for new zealand police district data , a few resources pop up.

Screenshot from Google Search

These all look like solid resources for data, especially since they’re directly from the country in question. Ideally, we want data that is broken up by district already or that we can combine into districts with a reasonable amount of work. Let’s see if the first link has anything like this.

New Zealand Police’s data repository with open data sets

After poking around, we find a few data sets about firearm ownership in New Zealand.

Let’s take a look at the Firearms data doc. Click its name under the blue Document Name column, then download the XLSX file on the next page. (If you don’t have Excel to open the XLSX file, you can copy and paste the data you need from this project’s Github repo.)

Excel sheet of firearm data from the New Zealand Police.

When we open the Excel file, we see that one of the tabs, Active standard licenses by District, has a clean table of data by district. These are the same 12 districts that our coordinate data uses. This is convenient. We’ll use this data for the sake of brevity.

Let’s save this tiny table with the 12 districts into a CSV file (we’ll name ours nz_firearm_licenses.csv), so that we can use it in our project. (Or copy/paste the CSV from here.) Put this CSV in the data folder (or whatever you named it) also.

CSV of firearm license data from the New Zealand Police

A few differences between this CSV and the original Excel table:

  • We put “quotes” around each of the district names, so that the CSV knows that districts with spaces in them are all part of one name.
  • We made the district names match the format of the GeoDataFrame (ex: Counties/Manukau vs. COUNTIES/MANUKAU).
  • We removed the commas from the numbers. (We could do this in our code later, but let’s just do it now.)
  • The second column was changed from Licences with two Cs to Licenses with two Ss. The “licences” spelling is used in New Zealand. You can use whichever spelling, just make sure it is consistent between your CSV file and your code.

Why did we choose this data set? The other data sets on the site are just as interesting, but this Excel file is simple and the other data is presented as Tableau dashboards where the option to download is disabled. We could hand-copy the data we see or find some way to scrape the dashboard data. But talking in depth about Tableau would derail the article and nobody needs a tutorial on how to copy and add numbers together, so we will skip it for now.

A chart of police district data from a New Zealand Police “Victimisations (demographics)” Tableau dashboard. Perfectly good data for other projects if you want to try!

The big lesson from this step is that we have to find the data ourselves. It’s easy to click a download button on a website, but the hard part is finding that data, deciding if it is right for your project, and changing it into a format that works for you.

Step 4: Load the Data We Just Found into Pandas

Remember that main.py file we wrote 2 lines in? Let’s open that again and add our newly found firearm data from Step 3.

Add the following lines to main.py. All we have to do is tell it the path of our new CSV file (which we also put in the data folder) and use read_csv(file), just like we did with Geopandas above.

import pandas as pd
...
licenses_df = pd.read_csv("data/nz_firearm_licenses.csv")

The wholemain.py file should now look like this:

import pandas as pd
import geopandas as gpd
map_df = gpd.read_file("data/nz-police-district-boundaries.shx")
licenses_df = pd.read_csv("data/nz_firearm_licenses.csv")

All we have done so far is find data and read it into Python. It’s not much, but the industry does say that half of data analysis is spent preparing your data. Low line count does not mean low work. Now, let’s actually work with it and make some maps.

Step 5: Combine the Data into One Dataframe

We have two dataframes, one with district coordinates and one with firearm data for each district that those coordinates draw. Let’s put them together so that we can assign a number of firearm licenses to each district.

The merge() function will combine the dataframes along whatever columns contain the district names and output a single dataframe (merged_df).

merged_df = map_df.merge(licenses_df, left_on=["DISTRICT_N"], right_on=["Residence District"])

The arguments left_on and right_on tell merge() which columns are supposed to be matched up between the two dataframes, which are the district columns (DISTRICT_N and Residence District). To remember which column is supposed to be left_on or right_on, notice that in the statement map_df.merge(licenses_df...) , map_df is on the left and licenses_df is on the right. Thus the district column from map_df is what we’ll give to left_on and vice versa.

Once we add that to main.py, our data looks like this.

The final combined dataframe. There are two duplicate columns with district names, but this won’t affect our final map.

Here is main.py so far:

import pandas as pd
import geopandas as gpd
map_df = gpd.read_file("data/nz-police-district-boundaries.shx")
licenses_df = pd.read_csv("data/nz_firearm_licenses.csv")
merged_df = map_df.merge(licenses_df, left_on=["DISTRICT_N"], right_on=["Residence District"])

That is all we needed to do to prepare the data: load the data and merge it.

Step 6: Make the Actual Map

Here we are! Time to make the map. It will only take 3 more lines.

import matplotlib.pyplot as plt
...
merged_df.plot(column="Licenses", cmap="Blues", legend=True)
plt.show()

What we’re doing is:

  • Importing Matplotlib (you will see the exact line import matplotlib.pyplot as plt a lot in your career)
  • Running plot() directly on the dataframe merged_df (which is still a GeoDataframe and thus able to handle mapping coordinates) and telling it to do the following: use the numbers from the “Licenses” column to decide the colors, make the colors blue, and show the color scale AKA the legend.
  • Telling Matplotlib to show us the result with plt.show()

This is now main.py in its entirety:

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
map_df = gpd.read_file("data/nz-police-district-boundaries.shx")
licenses_df = pd.read_csv("data/nz_firearm_licenses.csv")
merged_df = map_df.merge(licenses_df, left_on=["DISTRICT_N"], right_on=["Residence District"])merged_df.plot(column="Licenses", cmap="Blues", legend=True)
plt.show()

Go back to your Terminal and run python main.py with these new lines of code added.

The map of New Zealand should pop up on your desktop!

Map of firearm licenses for each police district in New Zealand. Data source: New Zealand Police

If it pops up then disappears, or doesn’t show at all, try these troubleshooting tips with plt.ion() and plt.ioff().

That is it! A map has been made.

Step 7: Polish the Map Up

We have our basic map. What about the title? What about all the clutter?

After merged_df.plot(…) but before plt.show(), we will add all of the code that follows.

Let’s add the title “Firearm Licenses by Police District in New Zealand (July 2021)”, since this data was true as of July 2021, per the New Zealand Police’s website.

plt.title("Firearm Licenses by Police District in New Zealand (July 2021)")

The tick marks and labels on the X and Y axes are both useless and in the way of the title, so let’s get rid of those by setting everything to false in tick_params().

plt.tick_params(
axis="both", # affect both the X and Y axes
which="both", # get rid of both major and minor ticks
top=False, # get rid of ticks on top/bottom/left/right
bottom=False,
left=False,
right=False,
labeltop=False, # get rid of labels on top/bottom/left/right
labelbottom=False,
labelleft=False,
labelright=False)
New Zealand Matplotlib map without tick marks or labels

To clean it up further, we will:

  • Get rid of the border around New Zealand with plt.axis("off")
  • Nudge the country to the left a bit (paradoxically by using the right option in plt.subplots_adjust(right=0.85) )
  • Nudge the title up by adding y=1.04 to plt.title(...)

From top to bottom, all of our plotting code now looks like this (and main.py looks like this):

merged_df.plot(column="Licenses", cmap="Blues", legend=True)plt.title("Firearm Licenses by Police District in New Zealand (July 2021)", y=1.04)
plt.tick_params(
axis="both", # affect both the X and Y
which="both", # get rid of both major and minor ticks
top=False, # get rid of ticks top/bottom/left/right
bottom=False,
left=False,
right=False,
labeltop=False, # get rid of labels top/bottom/left/right
labelbottom=False,
labelleft=False,
labelright=False)
plt.axis("off")
plt.subplots_adjust(right=0.85)
plt.show()

And this is the final result!

Our result is a solid 2D map of New Zealand with minimal frills and distractions. Your main.py file should look like this (sans comments).

If you want to use the map colors shown at the top of this article change merged_df.plot(...cmap="Blues"...) to cmap="plasma_r" or cmap="YlGnBu" instead. Check out the list of Matplotlib colors here.

We could adjust this even further with margins, padding, and other args and functions and options in Matplotlib. There is more to be done. If you’re looking for ideas for cool Matplotlib charts to make, here is a gallery of 100s of Matplotlib charts and how to create them.

To save this image, you can add plt.savefig("nz_map.png") right above plt.show(), and it will save to the same folder as everything else. If it comes out too pixelated, add the argument dpi=300 to plt.show(...). A higher DPI means a more detailed image.

Bonus Step 8: What Does This Tell Us About New Zealand?

What does our new firearm map mean? It looks like there are more firearm licenses issued to southern districts than northern districts. There are half, or even a third, the number of licenses issued in northern areas, especially around major cities like Auckland and Wellington.

The population density shown here does seem to be dark in many areas where the amount of firearm licenses were shown to be low in the map we just made. This chart is from Chartsbin.com and Statistics New Zealand and uses data from the 2006 New Zealand census.

This could lead us to follow-up projects to answer questions such as:

  • Who tends to have more firearm licenses in New Zealand, people closer to urban areas or rural areas?
  • Are the areas that have high firearm licenses also areas with large swaths of public land, such as hunting grounds or other areas for sport shooting?
  • What are New Zealand’s rates of ownership for illegal/unlicensed firearms? Does presence of licenses tell us about the true presence of firearms in each district?

The New Zealand Police and other organizations provide some data here if you‘re interested in investigating these questions:

If you do a project like this, message me a link here or on LinkedIn, and I’ll add a link to your project to this article.

Summary

You should now be able to plug-and-play other shapefiles and data sets to get some cool maps. Just remember that for each row/shape you have in your dataframe of geo shapes, you need a column with some number representing a total of something (total crimes, total cats stuck in trees, etc.). Then .plot() that sucker, and you’re good to go!

If you have questions or you think this is the worst tutorial you’ve ever read, message me here or on LinkedIn. The code is also up on Github.

Glossary

Pandas — a Python library that makes working with 2D data sets easy. Python lists and dicts are a data nightmare.

Geopandas —Literally Pandas with geographical/GIS functions tacked on. We used this to handle all the coordinates we downloaded from the New Zealand Police.

Matplotlib — Make charts and graphs. Works smoothly with Pandas and Geopandas.

choropleth map — A map of a city/country/area with a bunch of colors to express data (e.g. population density)

shapefileA file or group of files for storing location data, coordinates, etc.

DataFrame —The Pandas object that holds all your data

GeoDataFrame — Like a Pandas DataFrame, except it knows about all the cool Geopandas functions and how to do them

--

--

Andrew Roman
Analytics Vidhya

Data Analyst specializing in security and neurobiology