Building data analysis for Taxis in NYC

Amy Cheong
Geek Culture
Published in
6 min readAug 25, 2021

Interpreting a dataset to get an understanding of our product — the Flying Car Taxi Service

How might we build the first flying car taxi service for NYC?

Case Study: Flyber — Flying Taxi Service

You, as the data product manager, need to unveil the first flying car taxi service, Flyber, in one of the most congested cities in America — New York City. Some of the datasets are given by the course in order to extract insights from explicit feedback.

📊 Analyse data

Since Flyber is a new technology, let’s look at existing taxi data for comparable initial analysis. We will identify our customer and their pain points.

🚕 Competitor Analysis — taxi & digital ride-sharing

Motivation: What are taxis used for?

  • Users want to travel from point A to point B and save transportation time when they do not own a car.
  • Users do not need to take care of the hassle of driving their own car such as finding a parking spot, finding the location, driving long distances etc.

What are the existing pain points with taxis?

  • Unknown price as the driver might over-charge or extra surcharge during peak hours or special pickup place.
  • Inconvenient payment methods and hassle of paying a credit card surcharge
  • Hard to hail a taxi during peak hours when there is limited access to taxi availability in that area
  • Slow customer service and driver details if users have to complain about bad service or lost an item.

How about digital ride-sharing services?

  • Longer waiting time when drivers reroute to more locations when picking up and dropping off passengers resulting in more time spent.
  • Safety concerns and lack of privacy as anyone can be sitting in the same car to share the same ride. Users might not be insured if there’s an accident.

🚕 Data exploration

Before we look at the dataset, we should come up with questions to avoid going into rabbit holes. We will form a baseline for our Flyber services from our taxi dataset.

Questions that we are interested in:

  1. Location: Where should we set up our Flyber? Where are the peak pick-ups and drop-offs locations?
  2. Time: Should Flyber operate 24/7 or be limited? Should we target long or short distance trips?
  3. Price: How much should we price our service? Do we charge per trip or user?
  4. User: Which user group will be our first potential target users (eg. gender, age, income level)?

🔑 Key Takeaways from Dataset

General

  • There are 1,048,468 records in the dataset and each record represents a journey made by a customer. This record contains the total distance, pickup and drop-off location, duration and vendor identification.
  • The data date range is from 01/01/2016 and 06/30/2016.
  • Geographical data provided is not only limited to New York and its region. We can see a single trip on the West Coast. This can be an outlier in the dataset and represent an opportunity to expand further business to the West Coast. Most of the records are in New York and New Jersey.

Data distribution (Before clean-up)
We need to know what’s the common user usage to set a baseline on what’s considered normal. When there is a significant jump from 95% to 100% of each variable, this suggests potential outliers.

  • ⏱ Trip duration: People spent around 11 minutes on their rides but with a standard deviation of 98 minutes. 95% of people spent at most 35 mins but the highest trip was 58771 mins.
    🧹We should be clean up the data to accept trips from 1 min to 60 mins. A trip less than 1 min is not really considered as a taxi ride.
  • 📐Trip distance: The majority of people travelled 2 miles with a standard deviation of 4 miles. 68% of users had 1 to 6 miles distance with 95% of users travelled at most 11 miles. The max. 1241 miles lodged is considered as outlier data.
    🧹We should be cleaned up data to accept distance from 0.6 miles(1km) to 15 miles.
  • 👥 Passenger count: We have 2 passengers with a standard deviation of 4 passengers per trip. This tells us that 68% of the trips have around 1–3 passengers. 95% of the trips have at most 5 passengers but we have up to 9 passengers as outliers.
    🧹We should be cleaned up data to accept passenger count from 1 to 5 passengers.

Data distribution (First round of data clean-up)

Most of the pickup and drop-off services are concentrated in Manhattan with small size passenger groups.

To sum up:

  • ⏱ Trip duration: Users spent around 1 min — 20.65 mins on each trip.
  • 📐Trip distance: Most of our users travelled 1 mile — 5.28 miles to or from the city area.
  • 👥 Passenger count: 1–2 passengers per trip. Most of the taxi trip takes a small group of passengers. We see the higher potential to target to small passenger size group for Flyber.

💵 How about the price?
I decided to set a simple price formula from the NYC taxi fare site to give us a rough estimate by including the initial charge and charges per 1/5 miles.

Price = $2.50 + ($0.50 * [Duration_min]) + (([Distance]/0.2) * 0.50)

The majority of users paid around $14 with a standard deviation of $13.50. This means 68% of users paid from starting price of $2.50 to $28. In reality, we might need to evaluate each trip based on location and trip hours as the price can be further influenced by peak hour demand surge, special pick-up location surcharge, etc.

⌚️When is the popular pick-ups/drop-off period?
• Peak pick-ups month:
March but no significant trend from Jan to Jun.
Peak pick-ups day: We see steadily increasing trends towards weekends (Fri & Sat) .
• Peak pick-ups times:
7-8AM, 1–2PM, 6–7PM on weekdays whereas 12–2PM, 6–11PM on weekends. This might be due to lunchtime and the commute period.

🗣 🗒 User Research Survey

You and the user research team ran a quantitative survey on existing taxi and/or ride-share users in New York City to determine sentiment around potentially using a flying taxi service.

We will be looking at surveys and cross-check with our taxi dataset key takeaways.

💯 Positive Sentiment for using Flyber

  1. Gender: Females seem to be more inclined to try to compare to Males.
  2. Age group: Users who are in their 20s, 40s, and 60s are more inclined to try Flyber.
  3. Income group: We see that middle upper-income people who earn in the $40k-$80k bracket are more willing to spend on Flyber.
  4. Popular Neighbourhood: Mostly concentrate in Manhattan (Midtown, Battery Park, Hell’s Kitchen and Financial district). This reflects the same with our taxi dataset.
  5. Usual spending per mile: Users are willing to pay around $14 to $32 (USD) for Flyber, which is a common taxi price range ($2.50 to $28) that users have been paying. Women are willing to pay higher price points.

❌ Negative Sentiment for not using Flyber

  • Reason to say no: Women are concern about safety⛑ while men are worried about money💰(pricing)

🙏 Thank you for reading!

Feel free to leave your comment below, or DM me at @amycheong19.

--

--

Amy Cheong
Geek Culture

Current: Product Manager at Workmate • Always Software Engineer.