TLC Mentors Students Using Big Data

When you get in a yellow or green taxi in New York City, the cab is outfitted with equipment that automatically records the time and location of every pickup and drop-off. Since 2009, the Taxi and Limousine Commission has used this information extensively to help create data-driven policies, find items forgotten in taxicabs, and investigate passenger complaints.

Originally, the public could request a redacted version of this information through the Freedom of Information Law. Due to the volume of requests the TLC received, we began to proactively publish the datasets online on a 6-month cycle. In order to protect passenger privacy, the TLC removes vehicle and driver identifiers and aggregates the pickup and drop-off locations to larger taxi zones. This data is a treasure trove of information for journalists, startups, urban planners, and academics — and the TLC loves to see it being used in novel ways to improve the city.

One notable user of taxi data is New York University’s Center for Urban Science and Progress (CUSP). CUSP is a data science school focusing on urban informatics, a field that uses data to better understand how cities work. This data can range from the quality of the air you breathe to the time you spend commuting. The school allows city agencies and industry partners to sponsor capstone projects where students use their new skills to analyze or solve problems using a data-driven approach. Over the last 4 months, the TLC mentored two teams of data science students. The teams were made of 4 to 5 students applying advanced data science techniques on TLC data to answer important and complex city planning questions. As mentors, TLC staff provided the teams with historical taxi trip data and helped to define the project scopes.

The first team used historical trip data to estimate unmet demand from passengers for taxis in 2015. This project was developed because historical trip records are often used to show taxi demand, i.e., more trips means there is a higher demand for taxis in that area. While that approach may work in areas heavily supplied by taxis like Midtown Manhattan, it is not a good indicator of demand in areas where taxis are hard to find. (A future blog post will talk more about the limitations to that approach and their implications for using historical taxi data to plan routes for taxi drivers.)

The team used trip data, land use data, and demographic information to identify areas of the city where one would expect to see taxi usage. Examples of this information include low rates of car ownership and less access to public transportation. Then, with a comparison to areas that do have a healthy taxi supply, the students estimated the number of trips per day that would happen in taxis if passengers were able to find one. The findings illustrated that the areas with the greatest potential for growth in taxi trips were northeastern Queens, south Brooklyn, and northwestern Bronx. These neighborhoods include Whitestone, Borough Park, and Riverdale.

The TLC only began to access trip data in the black and livery car industry sectors in 2014, so the team could not take into consideration the more recent rapid growth of black cars. For that reason, the large growth in app-based companies makes the results difficult to apply to today’s market. However, these students showed how other data sources can be used to fill in the gaps left by taxi records to estimate unobservable taxi demand.

Unmet demand team. From left to right. TLC Staff: Ben Kurland, Chair Meera Joshi, Jeff Garber. Student team: Alexey Kalinin, Anita Ahmed, Pooneh Famili, Xin Tang, Ziman Zhou. Faculty advisors: Huy Vo, Kaan Ozbay.

The second CUSP team used the TLC’s trip data to identify ideal locations for taxi and For-Hire Vehicle (FHV) relief stands throughout the city. Taxi and FHV relief stands are parking spots that are reserved for drivers to use to take a short break. Currently, stands are created ad-hoc. When an industry member requests a relief stand at a specific location, the TLC sends that request to the Department of Transportation, which evaluates whether the location can accommodate a stand.

Taxi relief stand team. From left to right. TLC Staff: Ben Kurland, Fausto Lopez, Jeff Garber, Chair Meera Joshi. Student team: Vishwajeet Shelar, Cheng Hou, Yao Wang, Le Xu. Faculty advisors: Huy Vo, Kaan Ozbay.

While a ground-up approach lets us respond directly to the needs of industry stakeholders, there has never been a systemic, data-driven review of relief stands throughout the city. This team used taxi data, parking violation data, and the locations of publicly-accessible restrooms to identify ideal locations for taxi and FHV relief stands. They identified 8 areas in Manhattan that are near places with strong taxi activity, publicly-available restrooms, and have higher than average parking violations by TLC-licensed drivers. The lack of adequate places to rest is a growing concern among drivers that the TLC is attempting to tackle. These results provide a useful map to, among other things, validate with drivers as the TLC works to expand relief opportunities for drivers.

Map of ideal taxi relief stand locations. Darker blue hexagons represent areas with high parking violations, taxi activity, and access to public restrooms.

Jeff Garber is the Director of Technology and Innovation at the Taxi & Limousine Commission. He manages initiatives to improve the passenger and driver experience through the use of data and technology. He holds a Master of Science from the Tufts University Friedman School of Nutrition Science and Policy, and his interests include travel and circus arts.