Folium and Choropleth. Weird Names, Cool Graph.
Let’s just make something cool and call it a day.
I’m a data scientist. I’m a data scientist who is currently working on a time series anomaly detection tool. I’m a data scientist who also misses messing around with datasets for sh!ts and giggles. So, I found a relatively small dataset on Kaggle that features (hehe, machine learning pun) some Airbnb listings and metrics in New York City (NYC) from 2019. I had full intentions to spend hours upon hours analyzing the dataset and provide excrutiatingly detailed answering the questions Kaggle posed. About an hour later I didn’t get around to answering a single question…INSTEAD, I MADE A COOL LOOKING GRAPH!
For some of you, a choropleth graph might seem a little ‘meh’. But I personally think it’s a pretty cool thing. So cool, in fact, that I decided to postpone my analysis of the Airbnb dataset and create a tutorial on how I made a choropleth graph using a Python module called Folium.
I know what you’re thinking. Choropleth and Folium? What the heck? Despite sounding like grass-type Pokemon, they are actually Python/data science-related. Choropleth is a type of graph that represents a geographical space with sub-areas shaded in a way that is proportional to some overarching statistic, like shading in different areas of the San Fransisco Bay Area based on how many people are playing Animal Crossing: New Horizons (feel free to send me some Ironwood Furniture). Folium is a Python package that lets you easily create interactive maps that just so happens to have built-in choropleth capabilities.
The Kaggle page asks, “what can we learn about different areas…?”. Well, let’s see what a choropleth of the average price/night of an Airbnb in different NYC neighborhoods can tell us!
Small talk is over. Let’s begin shall we?
Download the ‘AB_NYC_2019.csv’ from this Kaggle link.
Download the GeoJSON file from BetaNYC’s website.
Let’s import our packages
import numpy as np
import pandas as pd
Then import and take a sneak peak at our csv file:
path = <path_to_csv_file>
raw_df = pd.read_csv(path)
What we want is a dataframe that gives each neighborhood’s average price, which we can easily achieve using the ‘groupby’ command:
choro_df = raw_df.copy()
choro_df = choro_df[['price','neighbourhood']].groupby('neighbourhood').mean().reset_index()
choro_df = choro_df.dropna(subset=['price'])
Perfect. Now before we start graphing, we’ll have to bin our ‘avg_price’ column. Each of these bins represents a different color that will be featured on the choropleth! Also, at the time of this post (4/24/2020), Folium only accepts at most 6 bin values.
bins = list(choro_df['avg_price'].quantile([0, 0.25, 0.50, 0.75, 1.0]))
Simple enough, right? Now we can start coding up the graphics portion of the tutorial. First define a simple function that creates a default Folium Map object:
def get_base_map(default_location, default_zoom_start=11):
default_location: [latitude, longitude]
base_map = folium.Map(width=800,
We need to calculate a default location for our map to start at, so let’s simply calculate the centroid of all the coordinates of the Airbnb’s given to us in the dataset:
avg_lat = raw_df['latitude'].mean()
avg_long = raw_df['longitude'].mean()
Finally, we can create our default Map object and add a choropleth later on top of it!
base_map = get_base_map(default_location=[avg_lat, avg_long])
The perfectionist in me immediately realized the color scale of the legend isn’t spaced out in a very user-friendly way. I haven’t quite figured out how to change the scale of the legend, but let me know if you have in the comments (please… it’s killing me…).
But, that’s it! Pretty easy eh?
Feel free to mention other cool graphs that exist outside the realm of vanilla EDA like box-plots and histograms in the comments section… I’d love to hear about the ones you use when you dissect a dataset!
Until next time, peace!