Motorcycle Diaries: How the Pandemic Helped Us Solve an Analytics Problem
It was the summer of 2020, and the world was still getting used to the WFH life. The data science team at Zendrive was doing the same, but we had a helpful distraction on our side. An interesting chicken-and-egg problem had come up.
We received a business requirement for a detector that could separate motorcycle trips from car trips. The geography we were working with was the beautiful country of Colombia.
To even begin tackling the problem, we needed to understand which trips were definitely made on a motorcycle. So, here we go.
What we have:
- Millions of trips in Colombia
- For each trip, an anonymized
- For each trip, a
trip_trail— where and when the trip started and ended, and the path taken.
What we need:
- A substantial number of definitely motorcycle trips
- Users who are predominantly motorcycle drivers.
We started by jumping on Google Street View to understand how common motorcycles are in Colombia. We know that they’re rare in the US and fairly popular in India. Where does Colombia fall on that spectrum?
Given we had the trip trail, we knew a sure-shot method. Find motorcycle-only roads, and see which of our trips passed through them. To do this, we could use our in-house geoplatform to scale up this join-operation of trips with the road network of Colombia.
However, this method did not yield answers. Not many roads exist in Colombia that are labelled motorcycle-only on OSM.
Our second approach was to look at motorcycle parking zones. There are designated parqueaderos in Colombia, and we could find trips that end there. Our geoplatform came to the rescue here again.
However, as this blog post will tell you, the parqueaderos are massively underutilized. Folks in Bogota mostly end up parking on the streets.
Collectively, we sighed. We needed to take a heuristic approach.
Researching further, we learned that delivery execs in Bogota only travel by motorcycles. If we could figure out what kind of travel patterns they have, we’d have our dataset.
Here are the defining patterns we wanted:
- Multiple trips in a day
- Short delays between trips
- Trips weren’t too long
Finding these patterns wasn’t difficult. After a couple of days of data-crunching and removing outliers, I had a large set of “candidate motorcycle drivers”. Success!
But if something’s too good to be true, it usually is. As I sat and tried to validate these trips on Street View, a panic descended upon me.
There is no way to tell motorcycles and taxis apart.
What if I had such a large dataset because the taxi-driving population had crept in too?
I tried to use features such as
average_speed to find clusters, but to no avail. In an urban setting, both taxis and motorcycles drive at the same speed, and go to similar locations. For example, if food is getting picked up by a delivery exec from a restaurant, taxis are taking passengers to the same restaurant.
Our solutions were either too specific and didn’t have enough true positives, or were too broad with far too many false positives.
Flustered, I sat back in my chair and did what people do when they hit a roadblock. I started scrolling Facebook.
As an upcoming student at Georgia Tech, I was on a Facebook group of future students. Serendipitously, a post caught my eye: “Hey, I’m C — . I’m from Colombia. Excited to join…”
On a whim, I struck up a conversation with this future peer. We soon got on a call and I explained my problem to him. He listened intently for a while, and then smiled. “Well,” he asks, “do you have data for the past few months?”
“Sure,” I say. “I have been processing an entire year’s data.”
“Why would you? Take just April’s data. See which users were making multiple trips then.”
“The lockdown was imposed in April. Taxicabs were taken off the streets due to the lockdown. So whoever was traveling multiple times then, must have been a delivery driver. ”
As the obvious sunk in, I began thanking him profusely (and promised him beer).
An entire night of coffee-fueled analytics followed. In the next few days, we had a clean dataset large enough to get our data science folks started on the detector. The real work could now begin.
At Zendrive, we solve exciting problems every day, sometimes with cutting edge technology, and sometimes over cutting chai. Come join us!
Surya graduated with a bachelor’s degree in Computer Science in 2018. He has been with Zendrive for over 3 years, in which time he worked with the iOS team, the data engineering team, and more recently, the backend team.He is now looking forward to his next challenge: a master’s degree in Computer Science at Georgia Tech.Outside of work, Surya is an avid quizzer who loves discussing geopolitics and philosophy. He is often found scouring Reddit and Wikipedia during work hours, and announcing his findings to the floor. Connect with Surya on Linkedin.