How should The Warriors have gotten back to Coney Island?

Evan Spiller
6 min readSep 24, 2023

--

A data science answer, using graph theory, to the question that has plagued us since 1979

Let’s recount the basic plot of that 1979 ̶m̶a̶s̶c̶u̶l̶i̶n̶e̶ ̶b̶a̶l̶l̶e̶t̶ action movie The Warriors for the 1000th time in my adult life. Every gang in NYC is invited to a midnight summit in Van Cortland Park in The Bronx and Cyrus, the host of this gangfest, is assassinated after a rousing ACAB-themed speech. The Warriors are blamed. They need to make it back to Coney Island and, on their way home, fight off a litany of delightfully absurd, costumed gangs.

They appear to take a fairly direct path through Manhattan. Something like this, according to a random blog I found.

The path they took might be the shortest by walking — but all that gang-fighting probably was really tiring. I bet they would have been happier avoiding gangs all together and that means avoiding as many people as possible.

Which leads me to the most important question I’ve answered in my data science career: what would have been the algorithmically right way for The Warriors to get home if their goal was to avoid the maximum amount of traffic?

As it happens, this is a question easily answerable with graph theory and, in particular, the Dijkstra’s shortest path algorithm, which can tell you how to get from point A to point B while minimizing the amount of total edge weight in a network.

I’ll explain what all this means, and I hope this post is a useful explanation of how to create a graph theory network and apply one of the more useful graph theory algorithms to a real-world (well, action movie) problem.

First, to work on this problem, I used NYC’s rideshare data. The category “High Volume For-Hire Vehicle Trip Records” includes every Uber and Lyft ride in NYC, which the city requires the companies to release. For what it’s worth, I probably could have approached this problem by using data on criminal activity instead — but the rideshare data does have the advantage of providing a detailed view of where people go and where they’re coming from in NYC i.e., where people are.

The important thing for now to understand is that it includes the pickup time, dropoff time, and IDs for the pickup and dropoff location for every ride in NYC.

These location IDs refer to taxi zones that, when mapped, look like this.

As you can see, each taxi zone touches adjacent taxi zones — which means that the shapefile can be abstracted into a ‘network.’

Networks look like this:

Here’s a really great introduction to graph theory. But as a tldr explanation: each node connects to adjacent nodes — in this case taxi zones — via edges. Edges can have weights which, in my case, will be the total number of rides from zone to zone in 2022.

NYC provides a shapefile of all taxi zones in the city. Using GeoPandas, I pulled the adjacent zones for each zone. As an aside: Tableau has really excellent capabilities for visualizing shapefiles and I used its visualizations to check my work.

# Step 2: Create a dictionary to store adjacency relationships
adjacency_dict = {}
# Step 3: Iterate through each geometry and find adjacent areas. #In this case, it's not just areas that touch, but areas that overlap
# - presumably a nuance of this shape file
for idx, row in gdf.iterrows():
adjacent_areas_touches = gdf[gdf.geometry.touches(row['geometry'])]['LocationID'].tolist()
adjacent_areas_overlaps = gdf[gdf.geometry.overlaps(row['geometry'])]['LocationID'].tolist()
neighbors = np.union1d(adjacent_areas_touches, adjacent_areas_overlaps)
adjacency_dict[row['LocationID']] = neighbors
# Step 4: Create a table (pandas DataFrame) from the adjacency dictionary
adjacency_table = pd.DataFrame(adjacency_dict.items(), columns=['LocationID', 'Adjacent_LocationIDs'])
# Step 5: Print or manipulate the adjacency table as needed
print(adjacency_table)

At this point, I had a table that looked like this:

I flattened the table to look like this so it would be usable later on:

Then, I added special connections for bridges and tunnels between boroughs.

This would be enough to create an unweighted network. However, I needed to weight my edges by the number of rides from district to district, which I did by importing all of the High Volume For-Hire Vehicle Trip Records files for 2022 and summing the number of rides from zone to zone in that year. Then, I joined those aggregations with my adjacency table to create a table that looked like this:

Finally, I used the python library NetworkX, which is used for graph theory applications, to create a network.

G = nx.DiGraph()
for _, row in merged_df.iterrows():
G.add_edge(row['PULocationID'], row['DOLocationID'], weight=row['trips'])

Here is a visualization of my abstracted network version of NYC’s taxi zones weighted by rideshare rides:

Each blue node is a taxi zone. Each edge — the black lines — connects to the other taxi zones that the zone is physically either touching or reachable by bridge or tunnel. And, while you can’t see it, each edge is weighted by the number of Uber/Lyft rides taken from node to node in 2022.

With this network, you can use Dijkstra’s Shortest Path Algorithm to figure out the lowest-traffic route from any zone to any other zone. Here’s another excellent explanation from the same creator of how the algorithm works and how it is converted into code.

Thankfully, the algorithm is built into NetworkX — so I just needed to apply the code like this:

print(nx.shortest_path(G, source=241, target=55, weight='weight'))

G is the network. ‘source’ is the taxi zone ID/node for Van Cortland Park and ’target’ is the taxi zone ID/node for Coney Island.

It resulted in this ordered list of taxi zones: [241, 240, 31, 242, 184, 208, 15, 171, 9, 192, 93, 134, 96, 63, 124, 201, 27, 154, 150, 29, 55]

I graphed the zones in Tableau and voila!

Here we have the algorithmically best way for The Warriors to have made it home to Coney Island from Van Cortland Park traversing minimum traffic.

If they had hired me as a consultant, it would have been a 2-hour movie of nine men in costumes walking from The Bronx through Queens into Brooklyn without having to engage in any violence whatsoever.

And, in all seriousness, this looks like a very reasonable way to get to Coney Island while avoiding Manhattan traffic.

If there are any other film-based data questions you’d like to see me tackle, put it in the comments!

In the meantime, I’ll be writing more posts based on my explorations of NYC’s rideshare data.

--

--

Evan Spiller

Data scientist. Background in marketing analytics. Contact @ linkedin: https://www.linkedin.com/in/evantspiller/. Or email: evanspiller@gmail.com. #opentowork