Mapping Divvy Data Using Python

Photo Credit

The best part of any project is naming it. My project: Hot Bikes. The goal: make a heat map of where people are biking in Chicago, using the amazing Divvy data set.

NOTE: this is a project that I’m working on as part of the ChiPy mentorship program. Max is my mentor.

After outlining the project with Max, it was determined that the first step was just putting Divvy stations on a map. Alright Max, challenge accepted. Oh yeah, and do it all with PYTHON. Very exciting.

So I Googled “making maps with Python” and lo and behold, someone has done almost exactly what I wanted to do with the bike data available from the bike share program in New York City!

The tutorial, Interactive Maps with Python (a very excellent three-parter with some delightful visualizations of net arrivals and departures that shows the beating heart of New York City—you’ll never guess where it is!!) got me a long way towards understanding how to do what I wanted to be doing, so, thanks for that, Vincent Lonij! Seriously, it’s amazing.

In this article, I’ll be talking about how I created a map, using Python, to show the departures and arrivals at Divvy stations here in Chicago.

You can follow along in this blog, in the code on GitHub (which may have changed significantly in the time before you read this), in this poorly-organized Jupyter notebook, or on the astral plane where I am currently projecting myself.

For this project, I’m using the Folium library, which so far has been an awesome tool for making maps using Python. Folium is a wrapper for the leaflet.js library.

Get the Trips

Divvy bikes provides data sets of the list trips by quarter. To initially process the trip data from the csv file and get it in the format I wanted to work with, I created the following function:

def read_and_format_trip_data(file_name):
'''
input: file name
output: pandas data frame
'''
bike_trips = pd.read_csv(file_name)
return bike_trips

This function takes a file name and uses the pandas read_csv method to create a data frame from the CSV file I downloaded with trip data from one quarter. It returns that data frame.

So I had my trip data (I would be able to aggregate trip counts from here in order to get departures and arrivals for each station, which I wanted to put on the map).

These data include:

  • Trip start day and time
  • Trip end day and time
  • Trip start station
  • Trip end station
  • Rider type (Member, Single Ride, and Explore Pass)
  • If it’s a member trip, it will also include the member’s gender and year of birth

Perfect, right!? Wrong. The data include the start and end station names, but they do not include the latitude and the longitude of said stations. That data had to come from another place, the live station info from the Divvy JSON feed, which includes all of the current working stations as well as their latitudes and longitudes and other fun nuggets like how many available docks there are.

Get the Stations

So the function below pings the Divvy API to grab the list of stations and makes a data frame from them using the pandas from_records method. This method takes an “index” property, which can allow you to use a different field as the index, which I set to the id, making the id of the station its index in the station data frame that is returned by this function.

Sidebar: I used the python requests library to ping the API.

def get_station_list():
'''output: pandas data frame'''
station_api_key = 'stationBeanList'
try:
response = requests.get('https://feeds.divvybikes.com/stations/stations.json')
station_data = response.json()
station_list = station_data[station_api_key]
stations = pd.DataFrame.from_records(station_list, index='id')
except:
print('something went wrong')
return stations

So now I have two different data frames, one with the trip data, which has the id numbers of the start and end stations, but not their latitude and longitude, and I have a data frame with all of the station ids and their latitudes and longitudes.

From Two (Data Frames) , One (Data Frame)

To get the map that I wanted, these two data sets needed to become one. The magic was going to happen in the following function:

def add_trip_counts_to_stations(stations, trips):
'''
input: data frames
output: data frame
'''
departure_counts = trips.groupby('from_station_id').count()
departure_counts = departure_counts.iloc[:, [0]]
departure_counts.columns = ['Departure Count']

arrival_counts = trips.groupby('to_station_id').count()
arrival_counts = arrival_counts.iloc[:, [0]]
arrival_counts.columns = ['Arrival Count']

stations = pd.merge(departure_counts, stations,
right_on='id',
left_index=True).merge(arrival_counts,
left_on='id',
right_index=True)
return stations

So, what’s happening in this function? Great question. In order to write it, I relied pretty heavily on the aforementioned mapping tutorial, and there were several things that I did not quite understand in it.

I needed the total counts of arrivals and departures from each station. To do that, I used the pandas groupby method. It took me a while to understand exactly what this was doing.

I did some digging in the documentation, where it says that “the abstract definition of grouping is to provide a mapping of labels to group names.” Wow, so helpful.

After even more digging, I discovered the following:

Calling groupby in pandas returns a groupby object, which has a number of aggregate methods attached to it. Among other things, groupby can take a string which represents a column in a dataframe, and it returns an object which is a grouping of dataframes, each dataframe in the group created from instances of the original dataframe that also contain the specified key. (I think this is accurate).

departure_counts = trips.groupby('from_station_id').count()

So the above line of code essentially says: take all of my trip data and create a bunch of dataframes that all have matching ‘from_station_ids.’ Then, count how many there are in that dataframe and return that value. If you print out the value of departure_counts right now, you get a dataframe that has the total number of identical start stations as all of the values in each column.

departure_counts = departure_counts.iloc[:, [0]]

This line of code replaces the departure_counts dataframe with just one column, trip_id, which is now not the actual trip id, but the total number of trips that started at that station. The iloc method does purely integer-based indexing, so something like:

departure_counts = departure_counts.iloc[:, ['trip_id']]

and:

departure_counts = departure_counts.iloc[:, 'trip_id']

will both throw errors. Also, if you’re wondering about the double brackets around the zero and why it exists, here is what I know. If you do this:

departure_counts = departure_counts.iloc[:, 0]

The object that is returned is not a dataframe, but a series. If you do this:

departure_counts = departure_counts.iloc[:, [0]]

The object is a dataframe. I do not know why. The rest of the code was written with the necessity of the returned thing being a dataframe, so I needed to keep the brackets. It’s not quite magic, but almost. My mentor Max says, “There’s no such thing as magic.” But we can still have fun, can’t we?

The next line of code:

departure_counts.columns = ['Departure Count']

Changes the dataframe into one that has an index of id and a column name of “Departure Count.”

Then, to get the arrival counts, I do the exact same thing, but instead grouping by the arrival station id and getting a dataframe that has one column “Arrival Count” and an index of station id:

arrival_counts = trips.groupby('to_station_id').count()
arrival_counts = arrival_counts.iloc[:, [0]]
arrival_counts.columns = ['Arrival Count']

Finally, I merge it all into one big happy family with these few lines of code:

stations = pd.merge(departure_counts, stations,
right_on='id',
left_index=True).merge(arrival_counts,
left_on='id',
right_index=True)

The merge method returns a dataframe, which is why I can chain them in order to do two merges. The first one merges departure counts with stations. The pandas merge method takes two dataframes as well as some other properties. The ones that are relevant to us here are the right_on and the left_index properties.

The right_on property, in addition to being super chill (right on, dude), tells pandas to use the id column as the key when merging with departure_counts, and the left_index=True says to use the index of the left table (departure_counts) as the key, so this means pandas will match up the station id with the index of departure counts, which is what we want.

For the chained merge, we use the left_on and the right_index properties because the now-merged stations data is our left dataframe, and arrival_counts is the right one, so we’ll use arrival_counts index to match to the ids in the merged stations dataframe.

Then, we return the now-merged station data which has our happy latitude and longitude for all of the stations. This is what we wanted.

Now, to put it all on a map!

Map That Ish

def put_stations_on_map(stations):
'''
input: data frame
output: folium map
'''
map = folium.Map(location=[41.88, -87.62],
zoom_start=13,
tiles="CartoDB dark_matter")
for index, station in stations.iterrows():
popup_text = "{}<br> Total departures: {}<br> Total arrivals: {}<br>"
popup_text = popup_text.format(stations.at[index, "stationName"], stations.at[index, "Arrival Count"], stations.at[index, "Departure Count"])
        folium.CircleMarker(location=[stations.at[index, 'latitude'], stations.at[index, 'longitude']], fill=True, popup=popup_text).add_to(map)
return map

This code creates a map (using folium) centered on Chicago’s center of latitude and longitude. Then, it iterates through each row in the stations dataFrame and creates a little popup with the arrival and departure count. I used the pandas at method to locate each row, which accesses a single value in a DataFrame or series.

Folium circle markers are delightful little markers that look like circles and when you hover over them, text is revealed! So, I create one of those for every row too, and then add them to the map, and return the map.

And that’s how I put little circles with arrival and departure counts from each Divvy station on a map of Chicago.

Each circle is a station

If you made it to the end, wow! WOW. wow. w . o . w . WOWOWOWOWOWOWOWOWOWOWOWOWOWOWOWOWOW.

I ❤ you.