Interactive Map visualization with Folium in Python
When working with datasets, more often than not, we encounter property sales, rental or housing-related datasets. At first glance, we carry on doing the analysis with descriptive statistics to find out the patterns in the dataset. We, as humans, get better insights when we are presented with visualizations such as scatter plots, box plots, pie charts, bar charts and more. However, if we are dealing with datasets that have addresses, zipcodes or geographical locations, we do not have to limit our data analysis with line graphs. In facts, we have to explore the possibility of efficiently using the location points present in the dataset. In this article, I explore the folium library for interactive map visualizations. The code on jupyter notebook for this article is available on my Kaggle page.
While doing an exploratory data analysis of New York City Property Sales, I tried to understand huge variations in property sale prices, which were distorting the bar charts with many outliers. Well, it turns out I should have aggregated property sales by zipcodes. Zipcode or postal code might be a better representation of how house prices vary in different regions in the city. Even then a line graph or bar chart may not be helpful in understanding how 167 different zip codes in NYC have property price variations. Folium library comes to help us. Folium is an easy-to-use interactive map visualization tool.
Getting Started
Before we start, I assume that the reader has some familiarity with Python and have Jupyter Notebook installed in their system. First, we have to install Folium library within Jupyter Notebook (or in the command line):
!pip install folium
Make sure you have an internet connection to install the library.
I did my analysis on a zip code level, but you can do map analysis within country or continent level as well. Folium has a very solid list of great examples on their page if you want to explore more. Today we will be concentrating on creating zipcode layer, adding markers, which shows some information about the location, to each of 167 zip codes and clustering markers together so the map does not become overwhelming to inspect.
Let us take the above image as an illustration of what we want to accomplish:
- Zipcode layers are boundaries of each zip code that covers some territory in the city. As we can see, some zipcode layers are purple meaning their average property sale prices are high whereas some zipcode layers are light blue or transparent white color meaning property sale prices are not as high. Dark regions mean no property sales occurred in that region in 2017.
- Markers are blue in color and located in the center of each zip code. When we hover the mouse over markers, they show few details of the region and when we click on the markers, they show more details of the place.
- Marker clusters are those green circle with a number on them. Marker clusters indicate how many markers are present in that location. This is a very convenient way of displaying markers in groups. Otherwise, the entire map of NYC will be filled with markers.
If we did not use a marker cluster, this is how our map would look:
When working with maps, we will be relying on longitudes and latitudes heavily. These are geographic coordinates that help us pinpoint any location on Earth. For example, [40.693943, -73.985880]
are the coordinates of New York City (NYC).
Now it is high time for us to get familiar with Folium objects.
Base map
We start with creating a base map, which will instantiate a map object for a given location:
import folium
map = folium.Map(location=[40.693943, -73.985880], default_zoom_start=15)
location
parameter accepts coordinates and default_zoom_start
sets the zoom level so that we are not very far or close to the place.
We can save the result as an HTML file: map.save('mymap.html')
. However, we are using Jupyter Notebook and we do not have to save the object in HTML. We can access the object within the notebook by referring to its name: map
.
Marker
Since we created the base map, now we can add more objects to it. We can add markers as follows:
folium.Marker(
location=[40.693943, -73.985880],
popup='Welcome to <b>NEW YORK CITY</b>',
tooltip = "Click for more"
).add_to(map)
The location
parameter accepts coordinates to pinpoint the marker.
popup
parameter accepts a string with an HTML formatting or raw strings. The text in the popup content will show up only when you click on the marker.
tooltip
parameter also accepts a string with an HTML formatting and displays its text whenever you hover over the marker with a mouse.
Note that, we are adding this object to our map
object with add_to()
method. This way we make sure that we pass our new marker to map
. Later we will see how we can add marker to marker_cluster
.
Choropleth map
Choropleth map is a map with its different areas shaded in colors. This is the main subject of our interest for defining zipcode boundaries on the map and coloring each of the zip code areas based on their average property sales.
Before we get started on the code, there are two points we have to remember:
- To create each zipcode’s boundaries on the map, we will need coordinates that define the area of a zip code. The documentation of Folium says that the choropleth objects accept geoJSON type files for boundary creation. It might be hard to find zip code boundary coordinates for less known locations, but since I am working with NYC dataset, I was able to find geoJSON file that has NYC zip code boundary coordinates.
- Second, we need to have a dataset that has some data e.g. population, the unemployment rate or sale prices for the corresponding zip code areas. The dataset I am working with has a column ‘ZIP CODE’, which conveniently has records on each house’s zipcode.
Without those two data points, we would not be able to build our choropleth map.
Now, here is the code to create a choropleth map:
geo_data
is the path to geoJSON file which has coordinates of zipcode areas.
data
is the dataset I am working with.
columns
accepts two columns: first, location data; second, Sale Prices. Important to note that if ZIP column is in numeric format, then change it to string because thegeo_data
zip codes are in str type and these two sources are matched by their zip code by Folium. To change ZIP column data type, use: df['ZIP']=df['ZIP'].astype(str)
.
key_on
is the parameter that accepts the location of the zip code from the geoJSON file hierarchy. How to figure out this property? Let us open the geoJSON file. This is how my geoJSON is structured:
Treat the file’s structure as a Python dictionary and keep in mind key_on
parameter’s argument should always start with feature
e.g. feature.path_to_address
. From the above screenshot, we can see that the path to postal code is feature.properties.PostalCode
. Note that geoJSON file should have type:"features"
included at the beginning of the file for Folium to find the path.
fill_color
accepts the color schema for displaying the areas. BuPu
stands for Blue Purple.
fill_opacity
sets the level of transparency of boundary areas when filling in the colors. 0 means complete transparency, 1 means solid colors with no background visibility. For best results, use 0.7.
line_opacity
sets the borderline opacity. 1 is dark distinct borders and 0 for no borders.
legend_name
sets the text name for the ticker on the top right.
Now we are almost ready to build our interactive map. However, we need to learn about one more item.
Marker Cluster
This is the easiest part of all. We will need to instantiate one single marker cluster to map
and that is it. We will have to only direct marker objects to this marker cluster:
from folium.plugins import MarkerClustermarker_cluster = MarkerCluster().add_to(map)
# now, all the markers should be directed to this objects as follows:
folium.Marker(...).add_to(marker_cluster)
Coding part
We now know fundamental blocks for creating interactive maps with Folium. While working with NYC property sales, I cleaned and aggregated the dataset for visualization purposes. One more detail is that I did not have coordinates of each zipcodes center for placing the markers. So I downloaded zip code coordinates from this repo and merged to my dataset. My final dataset is shown below.
This is the final code:
The link for the project is on my Kaggle page. Check out the link if you have any difficulties along the way.
Today we learned how to visualize our findings with interactive maps. Along the way, we learned how to create a base map, marker, choropleth map and marker clusters. There are more features in Folium, which we can explore in the future. The available tools and documentation make our data exploration fun and I hope you will also enjoy doing data analysis on your dataset.
Please, do not hesitate to give me 50 or more applauses. I appreciate your time and thank you for reading!