How I created a (highly simplified) version of Google Maps for NYC

Evan Spiller
6 min readOct 2, 2023

--

How do you get from place to place in a city without getting stuck forever in bumper to bumper traffic?

Last post, I wrote about how I used NYC’s rideshare data and graph theory to map the least-trafficked route from Van Cortlandt Park in the Bronx to Coney Island in Brooklyn.

I framed this — tongue-in-cheek — as how The Warriors should have gotten back to their home in the classic 1979 movie of the same name.

But, really, it was just a simple application of graph theory to a much more banal, practical problem: how do you get from point a to point b while minimizing traffic on the way there?

Here’s the application I created in Tableau to answer that question for any starting and ending neighborhood in NYC.

I’ll show you how I did it, then do a qualitative comparison to those master traffic algorithmizers: Google Maps.

Why this application works, in theory

In NYC, rideshare rides make up a huge percentage of overall traffic — I couldn’t find data on the percentage of cars throughout the whole city, but I did find a study that said 43.9% of Midtown Manhattan traffic is rideshare traffic.

So avoiding rideshare traffic is a good method for avoiding traffic generally.

Of course, if you read my last blog post carefully, you’d see that this algorithm only helps you avoid adjacent traffic. That is, traffic from one taxi zone to a taxi zone it’s directly touching or can reach by bridge or tunnel.

In NYC, the plurality of rides are actually quite short — if you look at this histogram, you can see the most common ride length is a little over a mile. This is January 2023, but the data generally conforms to this pattern.

So, actually, avoiding adjacent traffic means avoiding a big percentage of overall traffic.

And, moreover, it helps you out-game the plurality of travelers.

Since there are many ways to get to a zone far away from you and only one way to get to the zone that borders where you are, a longer-distance traveler can out-game shorter-distance travelers by avoiding the high traffic areas that shorter-distance travelers are forced to go through. That’s what this algorithm does.

How I did it

First, I looped through the algorithm for every taxi zone in NYC.

PUIDs = df2022['PULocationID']
paths = []
PUs = []
DOs = []
for x in range(len(PUIDs)): #for every row in Pickup Location IDs
try:
PUid = df2022.iloc[x, 1] #take the pickup id
DOid = df2022.iloc[x, 0] #take the dropoff id
shortest = nx.shortest_path(G, source=PUid, target=DOid, weight='weight') #run the shortest path function with the weight of 2022 total trips
paths.append(shortest) #append the route to a list
PUs.append(PUid) #append the pickup id
DOs.append(DOid) #append the dropoff id
except Exception as e:
print(f"Error occurred for PUid={PUid} and DOid={DOid}: {e}")
continue

This gave me a comprehensive list of shortest path answers from any zone to any zone.

I flattened the output so that it looked like this — if you filter for any starting point (PUIDs) or endpoint (DOIDs), you can see the Location IDs in its shortest path route.

From there, I moved to Tableau.

First, I uploaded the shape file of NYC taxi zones and visualized it like this. For what it’s worth: in my career, I started out as basically a Tableau power user before moving to Python to do most of my work. And, while Python is soooo much better when it comes to most analysis and modeling, Tableau still has certain advantages — it’s easier to to design certain visualizations and Tableau’s geographic visualizing features are simply excellent.

Then, I joined my output with a csv of taxi zones and neighborhood names. This was a bit silly as I could have done it in my original Jupyter notebook, but it worked fine.

I used Tableau’s blend feature to connect my ‘shortestpathIDs’ in my original output above with the shape file of NYC taxi zones by first changing ‘shortestpathIDs’ to just ‘PathIDs’ and then clicking on the link icon.

Finally I dragged the Pickup Zone and Dropoff Zone into the filters and used the show filters features to make them usable to the viewer.

This way, if you pick your Pickup Zone and Dropoff Zone, you can see which PathIDs are in the shortest route.

How well does it work?

I spent a fair amount of time playing with my final application and comparing it to what I think of as the ultimate algorithm for going from place to place while avoiding the maximum amount of traffic: Google Maps.

I was surprised and excited by how often my application came up with something really similar to what Google Maps came up with. My application probably came up with one of the top Google routes about 40% of the time.

It seemed to do this more often at night — one thing I’d like to do is add a layer of detail by splitting up the time of day into four equal parts.

It also at times appeared to find highways, aping the same directions in Google Maps that take you through 495 or the Cross Island Parkway. I’m not sure why this is — my guess is highways are in relatively low-density neighborhoods where there’s less adjacent traffic.

There was one buggy aspect that I noticed: it had an annoying tendency to take you through the Far Rockaways unnecessarily.

The more I looked at it, the more I could see the algorithm relying on a handful of standard tricks — the things any driver would know if they’ve lived in NYC long enough. Things like: avoid Manhattan and inner Brooklyn but don’t go too far out into the boroughs if you have to get back to the middle of the city.

Finally, I’ll point out that this algorithm may be particularly well-tuned to big cities. After all, minimizing traffic may save you the most time in a city since the actual distance traveled is negligible. But, in a long highway route where you’ll be traveling the same speed more or less any route you take, minimizing distance is the important thing.

I’m curious how Google and Waze does this — my guess is that they do use graph theory, but have ways of balancing optimizations based on what type of route you’re taking and have a lot of complicated ways of dealing with nuances.

But that’s a research project for another time.

--

--

Evan Spiller

Data scientist. Background in marketing analytics. Contact @ linkedin: https://www.linkedin.com/in/evantspiller/. Or email: evanspiller@gmail.com. #opentowork