Interactive Map visualization with Folium in Python

Saidakbar P
7 min readSep 23, 2019

--

When working with datasets, more often than not, we encounter property sales, rental or housing-related datasets. At first glance, we carry on doing the analysis with descriptive statistics to find out the patterns in the dataset. We, as humans, get better insights when we are presented with visualizations such as scatter plots, box plots, pie charts, bar charts and more. However, if we are dealing with datasets that have addresses, zipcodes or geographical locations, we do not have to limit our data analysis with line graphs. In facts, we have to explore the possibility of efficiently using the location points present in the dataset. In this article, I explore the folium library for interactive map visualizations. The code on jupyter notebook for this article is available on my Kaggle page.

NYC property sales in 2017 by Zipcodes. Medium platform does not allow Javascript in the articles. Check my Kaggle page for the interactive map version.

While doing an exploratory data analysis of New York City Property Sales, I tried to understand huge variations in property sale prices, which were distorting the bar charts with many outliers. Well, it turns out I should have aggregated property sales by zipcodes. Zipcode or postal code might be a better representation of how house prices vary in different regions in the city. Even then a line graph or bar chart may not be helpful in understanding how 167 different zip codes in NYC have property price variations. Folium library comes to help us. Folium is an easy-to-use interactive map visualization tool.

Getting Started

Before we start, I assume that the reader has some familiarity with Python and have Jupyter Notebook installed in their system. First, we have to install Folium library within Jupyter Notebook (or in the command line):

!pip install folium

Make sure you have an internet connection to install the library.

I did my analysis on a zip code level, but you can do map analysis within country or continent level as well. Folium has a very solid list of great examples on their page if you want to explore more. Today we will be concentrating on creating zipcode layer, adding markers, which shows some information about the location, to each of 167 zip codes and clustering markers together so the map does not become overwhelming to inspect.

Markers for each zip code

Let us take the above image as an illustration of what we want to accomplish:

  • Zipcode layers are boundaries of each zip code that covers some territory in the city. As we can see, some zipcode layers are purple meaning their average property sale prices are high whereas some zipcode layers are light blue or transparent white color meaning property sale prices are not as high. Dark regions mean no property sales occurred in that region in 2017.
  • Markers are blue in color and located in the center of each zip code. When we hover the mouse over markers, they show few details of the region and when we click on the markers, they show more details of the place.
  • Marker clusters are those green circle with a number on them. Marker clusters indicate how many markers are present in that location. This is a very convenient way of displaying markers in groups. Otherwise, the entire map of NYC will be filled with markers.

If we did not use a marker cluster, this is how our map would look:

It is hard to see which zip codes have high sale prices of properties

When working with maps, we will be relying on longitudes and latitudes heavily. These are geographic coordinates that help us pinpoint any location on Earth. For example, [40.693943, -73.985880] are the coordinates of New York City (NYC).

Now it is high time for us to get familiar with Folium objects.

Base map

We start with creating a base map, which will instantiate a map object for a given location:

import folium
map = folium.Map(location=[40.693943, -73.985880], default_zoom_start=15)

location parameter accepts coordinates and default_zoom_start sets the zoom level so that we are not very far or close to the place.

This is the result with NYC in the center

We can save the result as an HTML file: map.save('mymap.html') . However, we are using Jupyter Notebook and we do not have to save the object in HTML. We can access the object within the notebook by referring to its name: map .

Marker

Since we created the base map, now we can add more objects to it. We can add markers as follows:

folium.Marker(
location=[40.693943, -73.985880],
popup='Welcome to <b>NEW YORK CITY</b>',
tooltip = "Click for more"
).add_to(map)
The result of adding a marker to the map object

The location parameter accepts coordinates to pinpoint the marker.

popup parameter accepts a string with an HTML formatting or raw strings. The text in the popup content will show up only when you click on the marker.

tooltip parameter also accepts a string with an HTML formatting and displays its text whenever you hover over the marker with a mouse.

Note that, we are adding this object to our map object with add_to() method. This way we make sure that we pass our new marker to map . Later we will see how we can add marker to marker_cluster .

Choropleth map

Choropleth map is a map with its different areas shaded in colors. This is the main subject of our interest for defining zipcode boundaries on the map and coloring each of the zip code areas based on their average property sales.

Choropleth map of NYC for zipcodes

Before we get started on the code, there are two points we have to remember:

  • To create each zipcode’s boundaries on the map, we will need coordinates that define the area of a zip code. The documentation of Folium says that the choropleth objects accept geoJSON type files for boundary creation. It might be hard to find zip code boundary coordinates for less known locations, but since I am working with NYC dataset, I was able to find geoJSON file that has NYC zip code boundary coordinates.
  • Second, we need to have a dataset that has some data e.g. population, the unemployment rate or sale prices for the corresponding zip code areas. The dataset I am working with has a column ‘ZIP CODE’, which conveniently has records on each house’s zipcode.
Having location information is important

Without those two data points, we would not be able to build our choropleth map.

Now, here is the code to create a choropleth map:

Choropleth object

geo_data is the path to geoJSON file which has coordinates of zipcode areas.

data is the dataset I am working with.

columns accepts two columns: first, location data; second, Sale Prices. Important to note that if ZIP column is in numeric format, then change it to string because thegeo_data zip codes are in str type and these two sources are matched by their zip code by Folium. To change ZIP column data type, use: df['ZIP']=df['ZIP'].astype(str) .

key_on is the parameter that accepts the location of the zip code from the geoJSON file hierarchy. How to figure out this property? Let us open the geoJSON file. This is how my geoJSON is structured:

geoJSON file opened with a text editor

Treat the file’s structure as a Python dictionary and keep in mind key_on parameter’s argument should always start with feature e.g. feature.path_to_address . From the above screenshot, we can see that the path to postal code is feature.properties.PostalCode . Note that geoJSON file should have type:"features" included at the beginning of the file for Folium to find the path.

fill_color accepts the color schema for displaying the areas. BuPu stands for Blue Purple.

fill_opacity sets the level of transparency of boundary areas when filling in the colors. 0 means complete transparency, 1 means solid colors with no background visibility. For best results, use 0.7.

line_opacity sets the borderline opacity. 1 is dark distinct borders and 0 for no borders.

legend_name sets the text name for the ticker on the top right.

Now we are almost ready to build our interactive map. However, we need to learn about one more item.

Marker Cluster

This is the easiest part of all. We will need to instantiate one single marker cluster to map and that is it. We will have to only direct marker objects to this marker cluster:

from folium.plugins import MarkerClustermarker_cluster = MarkerCluster().add_to(map)

# now, all the markers should be directed to this objects as follows:
folium.Marker(...).add_to(marker_cluster)

Coding part

We now know fundamental blocks for creating interactive maps with Folium. While working with NYC property sales, I cleaned and aggregated the dataset for visualization purposes. One more detail is that I did not have coordinates of each zipcodes center for placing the markers. So I downloaded zip code coordinates from this repo and merged to my dataset. My final dataset is shown below.

LAT, LNG are included in the dataset for markers

This is the final code:

Final Result

The link for the project is on my Kaggle page. Check out the link if you have any difficulties along the way.

Today we learned how to visualize our findings with interactive maps. Along the way, we learned how to create a base map, marker, choropleth map and marker clusters. There are more features in Folium, which we can explore in the future. The available tools and documentation make our data exploration fun and I hope you will also enjoy doing data analysis on your dataset.

Please, do not hesitate to give me 50 or more applauses. I appreciate your time and thank you for reading!

--

--