Bus, Car, Bike…Oh, the Places we go!

Location data is often considered one of the most private types of data that can be collected about a person. Just from a person’s location, you can figure out where someone lives, works, and plays along with where their friends live and what they do with their weekends. This is more data than most people want others to know about them. However, this data can be incredibly useful to various organizations. Consider the following scenario: the city you live in (I’ll use Boulder for this example) wants to know how people get to work/school each day. Maybe they’re thinking about putting in some new public transportation infrastructure or building some new roads. There are ways that this question could be answered without collecting location data from citizens, but collecting this data would certainly be much easier than measuring traffic on every major street and wrangling bus ridership data.

In reality, however, it’s actually fairly difficult for a city government to get their hands on location data collected by our smartphones. There are other sources of data that are collected on us that can determine our location. Let’s consider an even more focused example: how do students/faculty get to campus and back? This can be a breeze for some and a logistical nightmare for others. Personally, I fall somewhere in the middle. On days where the sun is shining and the streets are covered in slush and ice, I take my bike. Otherwise, I’ll take the bus. Frequently, I’m done with class after the last buses that service my area have left. If that’s the case or the bus didn’t show up, I’ll take a Lyft. There have been times where every morning out of the week, I’ll have to do a tiny cost-benefit analysis. Something like: It’s snowing outside, the bus will make me late to class, a Lyft will cost me $10, therefore, parking costs me the least today and gets me there on time.

As you might guess, this information can be very valuable to the university and the city. Whether or not I bike to campus or ride the bus could easily be determined by gathering location data from my phone. However, the university doesn’t have access to this data, so how could they still figure this out?

The (fake) Data

Presumably, the university has about 4 datasets that would make this work fairly seamlessly. The first being a database of all of the parking passes they’ve issued, what parking lot they’re for and who they were issued to. This data could look a little something like this:

Second, anyone who’s ever parked on campus knows that you better be parked in a lot that you’re supposed to be parked in and that you better have a pass or paid the parking machine. If not, you’ll come back to a ticket on your windshield. Since CU Parking Services uses cars with computers and license plate cameras on them to verify whether someone is authorized to park in a lot, that data might look like this:

Third, the university also keeps track of all students and faculty that register their bikes on campus. Registering your bike with campus police means that, in theory, they will be able to return your bike to you much easier than if it wasn’t registered. That data might look like this:

Fourth, campus transportation wouldn’t be complete without the bus system. Since CU issues bus passes to all students and faculty and reissues a new bus pass every time they lose their CU ID card, they presumably have a database of issues bus passes associated with CU ID numbers. To complete the circle, let’s bring in some bus usage data:

By looking at the data above, we can determine there are a couple of ways that this data could be joined together. The university gives an ID number to all students and faculty and this is likely how they record things like parking passes and bike registration. This is a likely candidate for joining data together.

Joining the data

…and tracking unassuming students and faculty

By merging the data using the line of code we can join the bike registration and the parking pass datasets, we can quickly link specific cars to specific bikes.

pd.merge(bike_reg_df, parking_pass_df, right_on='id', left_on='id', how='left')
pd.merge(parking_pass_df, bus_usage_df, left_on='id', right_on='cu_id', how='inner')

We can also join the parking pass dataset and the bus data to determine who rode the bus to campus on a certain day but also drive a car to campus from time to time.

What happens when the temperature drops?

If we join weather data with our bus data, we can determine that when the average temperature decreases on a day the amount of bus riders goes up, which makes sense. Fewer people are biking and walking when it’s cold out and most of them ride the bus instead.

Conclusions

Overall, I only scratched the surface with what could be done with this data. By figuring out if someone drove to campus or rode the bus, it’s possible to determine generally where someone lives and get an idea of what their commuting habits are. This could be helpful for both a city government when expanding their transportation infrastructure. Companies like Lyft can use this data to target ads and better adjust prices during inclement weather and surge pricing times. Even CU could use this data to adjust parking pass prices or hourly parking rates based on predicted amounts of cars in the parking lots.

--

--