A Python Tutorial on Geomapping using Folium and GeoPandas

Jade Adams
6 min readOct 17, 2021

--

Maps and geography have been a long passion of mine, especially in my International Relations background. A side goal of mine as I grow as a data scientist is learning more Geographic Information System (GIS) abilities in Python. Folium and GeoPandas competency are key to basic geographic mapping, so I have honed in my education on those libraries. In this article, we will build a map and gradually add in more complex features together.

Utilizing a dataset of real estate sales in King County, Washington, I was able to build maps of the mean home price per zipcode. I gradually improved the features and usability of the maps. For this demonstration, I used Python 3.8 and Jupyter Notebook and imported the Folium, Geopandas and Pandas libraries. The full code can be found here on my Github.

Sourcing the Data

The data on home sales comes from a public access csv of the features of King County, Washington home sales from 2014–2015. The data can be found in the data folder of my above GitHub repository. For each home there are around 20 characteristics, including price, latitude, longitude and zipcode. Latitude, longitude and zipcode become important for making the data mappable for our GIS libraries. Below is the .info() prinout of the csv.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21597 entries, 0 to 21596
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 21597 non-null int64
1 date 21597 non-null object
2 price 21597 non-null float64
3 bedrooms 21597 non-null int64
4 bathrooms 21597 non-null float64
5 sqft_living 21597 non-null int64
6 sqft_lot 21597 non-null int64
7 floors 21597 non-null float64
8 waterfront 19221 non-null object
9 view 21534 non-null object
10 condition 21597 non-null object
11 grade 21597 non-null object
12 sqft_above 21597 non-null int64
13 sqft_basement 21597 non-null object
14 yr_built 21597 non-null int64
15 yr_renovated 17755 non-null float64
16 zipcode 21597 non-null int64
17 lat 21597 non-null float64
18 long 21597 non-null float64
19 sqft_living15 21597 non-null int64
20 sqft_lot15 21597 non-null int64
dtypes: float64(6), int64(9), object(6)
memory usage: 3.5+ MB

Building a GeoJSON File

Our first goal is to group the data by zipcode and map it. Geospatial data comes in many different formats online. I was able to find a KML file of King County zipcodes at this link and then convert it to a GeoJSON using this website.

It’s important to inspect the geojson file so we understand how Folium will read it. To read it, we can use Microsoft VS Studio or any other code editor. We see that is a nested dictionary below:

{ “type”: “FeatureCollection”,“name”: “Zipcodes_for_King_County_and_Surrounding_Area___zipcode_area”,“crs”: { “type”: “name”, “properties”: { “name”: “urn:ogc:def:crs:OGC:1.3:CRS84” } },features”: [ { “type”: “Feature”, “properties”: { “Name”: null, “description”: null, “OBJECTID”: 1, “ZIP”: 98001, “ZIPCODE”: “98001”, “COUNTY”: “033”, “ZIP_TYPE”: “Standard”, “COUNTY_NAME”: “King County”, “PREFERRED_CITY”: “AUBURN”, “Shape_Length”: 148134.77097613501, “Shape_Area”: 526121353.97049099 }, “geometry”: { “type”: “Polygon”, “coordinates”: [ [ [ -122.290611739526994, 47.355392635431102 ], [ -122.290611557838005, 47.355415072565997 ], [ -122.290611292570006, 47.355441749861903 ], [ -122.290610079941999, 47.3555774792625 ], [ -122.290606350762005, 47.355994133924803 ], [ -122.290606055731999, 47.356027417009798 ], [ -122.290603589477001

So the path through the dictionary keys is features->properties->ZIPCODE for a string format, or features->properties->ZIP for an int format. Now we can load the map and plot the zip data. We center the map on the center of King County, [47.5480,-121.9836]. I noticed setting the tiles to “Stamen Toner” simplifies the map down from its original, intensely detailed Google Map format.

map = folium.Map(location = [47.5480,-121.9836],default_zoom_start = .5)
map

Fitting Pandas Data

Now we can accordingly add details. We will need to use pandas groupby methods so we can plot values by zipcode. We want to first start with mapping by mean price. We will be using Folium’s Choropleth mapping method. You can find documentation here on GitHub.

avg_price_zipcode = df.groupby(“zipcode”)[“price”].mean().round(2)df_zip_price = pd.DataFrame()df_zip_price[“ZIPCODE”] = avg_price_zipcode.index
df_zip_price[“mean_price”] = avg_price_zipcode.values
df_zip_price[“ZIPCODE”] = df_zip_price[“ZIPCODE”].astype(str)

Mapping with Folium

Above, we created a new dataframe of each zipcode’s average price. Since we are mapping to ZIPCODE in the geojson file, we converted the zipcodes to strings. Now, we can plot the basic map and add more features from there, using Folium’s choropleth function. The input and output should look like:

map.choropleth(geo_data = ”King_County_Data/kc_zipcodes.geojson”, data = df_zip_price, columns = [‘ZIPCODE’,’mean_price’], key_on=’feature.properties.ZIPCODE’, fill_color= “YlOrBr”, fill_opacity = 0.6, line_opacity=0.2, legend_name = ‘MEAN SALE PRICE’)map
A choropleth map of mean price per zipcode, set with the map style “Stamen Toner”.

Improving our Choropleth

Moving forward, choropleth has several built in variables to support understanding. For one, several coloring schemes exist including ‘YlGnBu’ or ‘BluPu’. The GitHub documentation link of Folium, referenced above, is helpful. I’m going to go over some major upgrades we can do to our map.

First, we can change the default values assigned to the color keys. The legend in the top right shows the range of means for each color. They each correspond to certain quantiles of the data. We can change these quantiles by adjusting the threshold scale. I’m going to start a fresh map and add the threshold.

my_thresh = df_zip_price[“mean_price”].quantile((0,.5,.9,1)).tolist()map.choropleth(geo_data=”King_County_Data/kc_zipcodes.geojson”,data = df_zip_price, columns = [‘zipcode’,’mean_price’], key_on = ’feature.properties.ZIPCODE’, fill_color = “YlOrBr”, fill_opacity = 0.6, line_opacity=0.2, legend_name = ‘MEAN SALE PRICE’ ,overlay =True, threshold_scale = my_thresh)
The same choropleth with the threshold_scale adjusted. Now the legend colors are different values.

The legend, scaled by price, shows how big a gap there is between the 90th percentile zipcode, and the maximum zipcode. With an average sale price of over $2,000,000, 98039 (the city of Medina) is a true outlier. You can adjust with the quantiles displayed by editing the values in my_thresh.

Making the Map Interactive

Next, we are going to make the map more interactive. Two more conditions available us. Highlight allows us to hover our mouse over map objects, in this case, zipcodes. Folium choropleth doesn’t have conditions for interactivity so we must add features on top of the map. We will do this by adding a feature to the map.

folium.features.GeoJson doesn’t allow for multiple data sources, i.e. one for the data and one for the geojson, so we have to merge the price data with the locational data for the mapping. We do this by using GeoPandas to read the geojson file as a dataframe and then merge the price dataframe onto it:

import geopandas as gpd
geo_data = gpd.read_file(‘King_County_Data/kc_zipcodes.geojson’) geodata_price = geo_data.merge(df_zip_price, on = ‘ZIPCODE’)

We can now plot the feature using geodata_price. folium.features.GeoJson has several parameters that are important for running, including tooltip, style_function, and highlight_function. We create the feature as a class instance then add the class to the map:

highlights = folium.features.GeoJson( geodata_price,style_function= lambda x: {‘color’:’transparent’, ’fillColor’:’transparent’, ’weight’:0}, highlight_function = lambda x: {‘fillColor’: ‘#000000’, 
‘color’:’#000000', ‘fillOpacity’: 0.50, ‘weight’: 0.1}, tooltip=folium.features.GeoJsonTooltip( fields=[ “ZIPCODE” , ’mean_price’ ], aliases = [ “Zipcode: “, “Mean price in USD: “], labels = True,sticky = False))
map.add_child(highlights)
map.keep_in_front(highlights)
folium.LayerControl().add_to(map)
map
We can now highlight on each individual zipcode. We can see that 98004 includes Bellevue and the mean price is $1,356,523.99.

Adding Markers

Adding labels to mapped data can be important for the reader to understand additional details or highlight the most important points of data. I am going to add a feature marker using folium.Marker and folium.Icon. We can find different icon styles at this link, and use the name in the icon parameter of folium.Icon. For the location parameter, I found the latitude and longitude of downtown Seattle online. The code and output will look like this:

folium.Marker(location = [47.6097,-122.3422],popup=”Downtown Seattle home sales were not included in the data set.” , icon=folium.Icon(color=’red’ ,icon=’arrow-down’ )).add_to(map)map
Multiple markers can be built on the map, for this demonstration I only built one.

Conclusion

You can see the power of Folium with just a couple of functions. We learned basic tools in Folium and GeoPandas to use dataframes and geojson files to display data on maps. We learned how to create a folium.map instance, then build data on top of that map using folium.choropleth. We then modified the choropleth’s legend using the threshold_scale parameter. We then learned how to convert a geojson to a geopandas dataframe and merge dataframes to add highlight details to each element of the map with folium.features.GeoJson. Lastly, we figured out how to add individual markers to the map with folium.Marker and folium.Icon. Looking ahead to gain more knowledge on mapping, you can check out the references below.

Please follow me on Medium and LinkedIn for more tutorials, following my experience transitioning from International Affairs to Data Science, and advocating for trans people in the workplace.

--

--

Jade Adams

She/Her. UCLA graduate and data scientist based in NYC. Passionate about social science research and all things trans and queer.