The Experiment: Which Taxi should You pick in NYC?

If an individual ran an unlicensed “taxi” on their own, they would probably face legal sanctions for being a “pirate” taxi driver. My grandfather, 86, used to be one, and even still is from time to time as he enjoys driving his friends around the country (Greeceland). It helps making an extra income on top of the process to buy chocolate for his grandchildren. He always was and still is illegal.

It was the end of July 2014 when I became aware of the existence of the largest ( in terms of the number of points per temporal unit) urban mobility dataset I had ever encountered in my life! And it was about Taxi movement in New York City! The NYC Taxi dataset was a revelation to me, not only because it was one of the coolest mobility datasets I had come across, during seven years working in the field, but also because it was acquired on the basis of the Freedom of Information Law. As a consequence, these data, describing taxi trips in New York, during the whole 2013 became publicly available for anyone to download. And I did download it ☺. Below you can see a pictorial representation of the data. For every taxi journey in a sample, given the corresponding origin (pick up) and destination (drop off) geographic points I’ve drawn a black point.

On August 20th, Uber announced their API for developers. After all every data driven company in Silicon Valley that respects itself has one. Not only does it allow for a community of innovators to engage with your product, but at times like this, the community can be a means of political pressure, implicit or explicit. Afterwards, you can also charge to access it in order to increase revenue. So from a business point-of-view this move is win-win-win.

I started fiddling around with the API, but the endpoints that attracted my attention were those concerning Uber pricing. You could enter two geographic locations in terms of longitude and latitude coordinates, say an origin and a destination for a trip, and get a quote of how much an Uber would charge for it. At this point, I had an idea. Why don’t we go through the trips recorded in the 2013 NYC Yellow Taxi dataset, check the origin and destination coordinates for each of those, check the fare charged and then see how much Uber would charge me for that. Uber had persistently been advertising that its cheapest service, UberX, is cheaper than the iconic yellow taxi. At least that was the case according to its blog on July 7th.

It was the beginning of September, two months later, when I had to prepare an informal presentation for my research group in Cambridge that I decided to do so by running the “Uber vs Yellow Taxi Price Comparison Experiment” otherwise named, “Which taxi shall I take in NYC”, which I describe below.

I built a Python script that went through the New York City taxi dataset, picked a record through random uniform sampling, and then queried the Uber API using the origin and destination of the yellow taxi trip as input. It was that simple: for every yellow taxi trip, query Uber and ask how much they would charge for it. Then compare the results.

Not anywhere perfect from a scientific viewpoint, as there were all sorts of biases involved in the experiment. The NYC taxi data corresponded to year 2013 whereas Uber to 2014. Although note that the prices for yellow taxis in the city had last changed in 2012 after 8 years. So it should offer a good approximation of today’s prices. Further, there was no control for time of the day/week for the API query, but to me it was really the process that mattered. The process of comparing two different companies that provide the same service in the same geographic area. Just as we have for airfares for a long time now allowing for transparency in a free, competitive, market. This process of data driven transparency can help consumers and by extension our society function better.

Below is concrete evidence of what we could learn from such process in the context of urban taxi services.

Lesson 1: Despite the fact that the distribution of prices is qualitatively similar, Yellow Taxis appear to be on average almost 1.4 dollars cheaper than Uber X.

Who is cheaper now?

Lesson 2: Uber X is cheaper for long distance trips which cost more than ~35 dollars, but can be significantly more expensive for short journeys. Those journeys would cost much less that yellow taxis as one can see here:

Lesson 3: By law of human (and animal) mobility, most journeys we perform are short. Statistically speaking and roughly approximating we perform levy flights! This has been verified empirically for taxis, even if a common conception may be that we take taxis when we want to travel further. Uber exploits these patterns in human movement to make money.

Roughly 1 in 3 yellow taxi trips in New York corresponds to a geographic distance of ~2km.

Lesson 4: Manhattan is a yellow taxi stronghold. A medallion (yellow taxi licence) in the city costs almost a million dollars, according to anecdotal evidence provided by a local taxi driver. However, recently prices appear to have dropped due to pressure from Uber. Would you let these gentlemen in Silicon Valley take over that wealth? Below I have marked with yellow the areas where taxi trips were cheaper (by majority of trips) for yellow taxis and with black otherwise. Uber is creeping, but yellow taxis stay strong. Who will win? Hopefully the ones who provide the best service at the best price.

Why don’t you run the experiment for yourself! The Data is Out There! It’s your turn to kill the snake! Whoever the snake may be…

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.