NBA Network Analysis: Connecting the Dots with Neo4j

Finding the shortest links between two NBA athletes… while also playing around with Graph Databases.

Lucca Miorelli
7 min readMay 26, 2024
Photo by JC Gellidon on Unsplash

Introduction

Ever watched an NBA game and found yourself intrigued by the unexpected connections between players, as pointed out by the commentators? Inspired by these off-the-cuff remarks, we embarked on a project. Much like the concept of ‘Six Degrees of Kevin Bacon’ — a playful game of connecting disparate actors through their film roles — we aimed to connect NBA players in a similar fashion.

In this blog post, we’ll walk you through our exploration into Graph Databases, and how we used Neo4j to discover relationships between NBA players across different generations.

Overview of the Project

Our goal was straightforward: Identify the shortest path between two different NBA players through their relationships. To achieve this, we worked with NBA Draft data, using Neo4j to help us map out complex player connections. This wasn’t just about crunching numbers; it was about uncovering fascinating (and sometimes quite funny) narratives hidden within the data.

Here’s a concise breakdown of the steps involved in replicating this project:

  • Acquire the necessary data;
  • Format the data into nodes and edges;
  • Establish a Neo4j database instance;
  • Import data into Neo4j;
  • Utilize Cypher notation to interact with the data.

Data Acquisition and Processing

Like most data enthusiasts, we turned to Kaggle for our dataset, focusing on NBA Draft data. We then used Python to process this dataset. This involved transforming the data to fit the format required for graph databases: splitting entities into nodes and connections into edges.

From the court to the IDE.

After some basic processing, we can input the data to the Neo4j instance using the neo4j python package.

To increase the complexity of our ‘Six Degrees’ game though, we decided to only consider relationships of players with the team they were initially drafted from, rather than all the teams they played for throughout their career. A good challenge always adds a bit of spice!

Why Graph Databases?

Simply put, Graph Databases store data in nodes and represent relationships between these nodes using edges. This makes them perfect for visualizing and analyzing complex interconnected data, such as NBA player connections.

Although structured databases could have been used for this project, we opted for Graph Databases for a few reasons, some of them are: they are better suited for handling complex connections between data points; they stay fast and efficient no matter how big or complex the data is; moreover, the visual nature of these databases makes it easier to understand and see player connections. For a project like ‘Six Degrees’, where we’re finding the shortest link between two players, Graph Databases really shine.

👉 If you want to know more about Graph Theory, check out this: A Gentle Introduction To Graph Theory

Working with Neo4j

Neo4j was the tool we chose for this project. We made use of Neo4j’s free sandbox — a small, temporary database that’s available at no cost. If you’re keen on setting up an instance and discovering what it can do, feel free to give it a try in the link above.

Nodes created are as follows:

  • Player: Represents an NBA Player. (e.g. Jalen Brunson)
  • Team: Represents an NBA franchise. (e.g. Dallas Mavericks)
  • Organization: Represents the colleges, universities, and international organizations from which players are drafted. (e.g. Villanova)
  • Draft Class: Represents the actual draft year. (e.g. 2018)

The connections between the nodes are established through the following edges:

  • DRAFTED_BY: connects players and franchises. (e.g. Jalen Brunson ↔ Dallas Mavericks)
  • IS_OF_DRAFT_SEASON: connects players and draft classes. (e.g. Jalen Brunson ↔ 2018)
  • IS_OF_ORG: connects players and organizations. (e.g. Jalen Brunson ↔ Villanova)

This allowed us to populate the Neo4j database and create visually appealing graphs like the ones below! You can query the database by using Cypher syntax like:

# Show Jalen Brunson's connections (id: 1628973)
MATCH (n:Player {id: 1628973})
RETURN n
Jalen Brunson's related entities.

While it’s relatively easy to visualize and draw one player’s connections, things get more complex when dealing with a large number of entities. That’s where the strength of graph databases really shows!

Things can go wiiild

Results and Findings

By using a few simple queries, we can explore our dataset:

# Count all edges grouped by type
MATCH ()-[relationship]->()
RETURN TYPE(relationship) AS type, COUNT(relationship) AS amount
ORDER BY amount DESC;
# Count all nodes grouped by type
MATCH (n)
RETURN labels(n)[0] AS type, COUNT(*) AS amount
ORDER BY amount DESC;

Overall, we finished with:

Nodes (8,900)

  • Player: 7,884
  • Organization: 903
  • DraftClass: 74
  • Team: 39

Edges (24,320)

  • IS_OF_DRAFT_SEASON: 8,454
  • DRAFTED_BY: 8,001
  • IS_OF_ORG: 7,865

However, the most significant query — which is our primary aim in this blog post — is the query that demonstrates the shortest path between two Player entities!

Ladies and gentleman… It’s time! 🥁

# Shows the shortest path between two players
MATCH path=shortestPath(
(p1:Player {id: "PLAYER-ID-1"})-[*]-(p2:Player {id: "PLAYER-ID-2"})
)
RETURN path

By running the query above for different player IDs, we can see some interesting relationships between NBA players. Some are simple, but others may get a little complex:

  • You can notice that Josh Hart and Donte DiVincenzo are connected by Villanova, since they played together back in college:
  • Also, players that shared the same draft class, i.e. LeBron James and Dwyane Wade are connected by the draft class of 2003.
  • Players that have been drafted by the same team — despite being from from very different generations — also share this connection, i.e. D’Angelo Russel and Jerry West both drafted by the Lakers.

So far we’ve seen simple connections, but the fun part is finding surprising connections like the one between LeBron James and Kobe Bryant. Since these two greats don’t share any direct connection, there must be a path that connects them both.… how should it be? 🥁🥁🥁🥁🥁

Ever wondered how Zydrunas Ilgauskas, the Lithuanian center drafted by the Cavaliers in ’96, played a key role in linking LeBron and Kobe?

But if you’re a Cavaliers fan — or simply a curious individual who researched the 1996 NBA Draft, you’ll find that the Cavaliers had more than one pick; they had three. So, how can we show all possible shortest paths between Kobe and LeBron? We would need to modify our query slightly:

# LeBron James (2544) and Kobe Bryant (977) IDs
MATCH path=allShortestPaths(
(p1:Player {id: 2544})-[*]-(p2:Player {id: 977})
)
RETURN path

Which returns:

EXTRA: Some Additional Exploratory Data Analysis

Furthermore, we could also explore the best draft class of all time (according to Bleacher Report), which is 1984’s, introducing names like Michael Jordan, John Stockton, Charles Barkley, Hakeem Olajuwon, and more.

From all these years, which team have drafted the most number of players? The following query answers this question:

# Count of Players drafted by each Team
MATCH (t:Team)<-[:DRAFTED_BY]-(p:Player)
RETURN t.team_name AS Team, count(p) AS Drafts
ORDER BY Drafts DESC

The result shows that the Sacramento Kings gols the top position (508 picks), followed by the Atlanta Haws (489) and the New York Knicks (473). Let's delve into the distribution of Kings' draft picks over the years…

# Get all Players drafted by the Kings and their DraftClass
MATCH path = (t:Team {team_name: 'Kings'})
<-[:DRAFTED_BY]-
(p:Player)
-[:IS_OF_DRAFT_SEASON]->
(d:DraftClass)
RETURN path

This returns an amazing graph that shows all players drafted by the Kings, along with their respective draft classes.

On the right side, you can observe that there are some draft classes where the Kings did not make any pick. One may wonder, is it possible considering the Kings franchise is one of the founding franchises of the NBA? The answer is yes, as prior to the early 70s, the Kings franchise was known as the Royals. All draft picks of this franchise before 1970 are assigned to the Royals’ Team node.

Conclusion

We successfully employed Neo4j to map NBA player relationships, bringing a data-driven approach to our ‘Six Degrees’ game. This project was not just about analyzing data but also about exploring the interconnected world of basketball!

Navigating through such a multitude of relationships can be both overwhelming and captivating. I conclude with a final image, demonstrating just how awesome these graphs can get — resembling fireworks in their complexity and beauty.

MATCH path = (p:Player)
-[:DRAFTED_BY|IS_OF_DRAFT_SEASON*1..2]->
(t)
WHERE (t:Team) OR (t:DraftClass) OR ((:Player)-[:IS_OF_ORG]->(t))
RETURN path
LIMIT 500

Acknowledgements

A special shout-out to João Pedro Boufleur, the MVP of this project. His contributions were invaluable to the success of this venture.

References

--

--