NYC Green Taxi Ride Location Data Analysis using Python

Mohit Singh
datascape
Published in
3 min readFeb 14, 2017

1 Million NYC Taxi Rides taken in June 2016

GIF to show interactive NYC Taxi Pickups in June 6 (only 500 records) using Python — Refer to Github for full code and output
NYC Green Taxi Dropoff locations — June 2016

Last summer when I was returning back from New York to Syracuse, I missed my bus. The reason — As they say “The New York Traffic”. Well as to say I was coming from Queens to Manhattan, and it took my 2 hours in the taxi.

And to my surprise, it was the traffic getting from Queens to Manhattan that delayed it, not the one in Manhattan to my destination. I wondered how much the taxis contribute to the traffic, what is the pickups and dropoffs count be like. What areas get the most traffic from taxi, which location gets the least, where people gets most picked up from, where they get most dropped off to.

So first off, I found the latest available trip data on nyc.gov website. I used green taxi trip June 2016 data as it will also give me the overview of the trips in New York the boroughs of New York.

Number of Trips during this time : 1 Million (1048576)

This was great for the overall picture of the trips — the pickups and Dropoffs. One thing clearly that can be observed is that there way more dropoffs then pickups in Queens and Bronx significantly.

NYC Green Taxi Pickup locations — June 2016

There are more taxi Drop offs in the New York Boroughs than Pickups.

I wanted to dig deeper into the areas having the most pickups and dropoffs. Also I wanted to create something interactive to see which areas people actually get picked up more and dropped off more. After a bit of research, i found Folium python library really useful and wanted to try that. And i was glad it worked out. I was able to cluster the locations too, to see how many dropoffs/pickups happened in the area and then zooming in more to the exact locations.

Interactive NYC Taxi Pickups in June 6 created using the code in the python below ( Only 15 days data is used here)
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from datetime import datetime
import os
import folium
from folium.plugins import MarkerCluster
NY_COORDINATES = (40, -73)gdata = pd.read_csv('green_tripdata.csv')MAX_RECORDS = 1048576map_nyctaxi = folium.Map(location=NY_COORDINATES, zoom_start=9)marker_cluster = folium.MarkerCluster().add_to(map_nyctaxi)for each in gdata[0:MAX_RECORDS].iterrows():
folium.Marker(
location = [each[1]['Pickup_latitude'],each[1]['Pickup_longitude']], popup='picked here').add_to(marker_cluster)
map_nyctaxi

Full Code on Github.

--

--