Visualizing Hollywood Networks

Nick Aldershof
Analytics Vidhya
Published in
6 min readOct 31, 2020

Most the time when people think of graphs they think of something like this:

In the common vernacular of Data Science this is what we mean too, most of the time when someone asks for “A graph of the data”, this is what they mean.

But there’s another meaning, and when visualized it looks more like this:

A graph of a graph.

In this sense of the word graph we’re not talking actually talking about the visualizations, but rather about the relationships that we’re visualizing above.

Here a graph is defined as a collection of “edges” and “nodes”. Each node (visualized above in blue) represents some predetermined kind of object, and each edge (visualized above in black) represents the relationships between those objects.

Some common examples of graphs that you encounter in the course of navigating the internet include:

  • Facebook’s “Social Graph”: Each node is a user, and an edge indicates whether or not they’re connected. In Facebook’s case we call this an undirected graph because both users have to be friends with each other for a relationship to exist.
  • Twitter’s Follow Graph: Again, each node is a user, but the edges represent “following”. We call this a directed graph, since a user can follow someone without being followed back. Ex: BillGates has 52.4M incoming edges, and only 240 outgoing edges.
  • Google’s Graph of the Internet: Where each node is a website, and each edge is a link from a website to another. In fact this formulation is what allowed Google to develop the core of their search engine, PageRank, a very interesting topic in network analysis.

Now that we’ve got a quick introduction out of the way, let’s work through an example of what we can do with graphs! Since I don’t have have access to any giant datasets like that, I’m going to create a graph of Hollywood! That is, a network where each node is a movie, and the edges between them are the actors they have in common. By doing so we’ll be able to look at the connections between various actor’s filmographies, as well as enabling us to find the connections between movies that you wouldn’t expect.

First a look at what our graph actually looks like.

You can see we’ve pulled about 24k movies, with 1.6M linkages between all those movies. Note: If two movies share multiple actors we only record one linkage.

Degree here refers to the number of linkages between movies. For instance in our use case the degree of a movie is the other movies that actors from the original movie had been in. The higher the degree, the more likely the film is to either have a lot of star power, or to have an extremely wide cast.

The movie with the highest degree ends up featuring both! With credits including John Travolta, Bruce Willis, Uma Thurman, Ving Rhames, and Harvey Keitel, it’s no wonder Pulp Fiction gets first billing.

Movie Linkages

The first thing we’ll do is a modified version of the classic “Six Degrees Of Kevin Bacon”. However, instead of finding the connections between actors, we’ll find the connections between movies. The opposite formulation of the graph with inverted edges and nodes would allow the original formulation easily.

By looking at the shortest path between two nodes we can enumerate the connections. Let’s start with Pulp Fiction, and another blockbuster, Star Wars!

Finding link from Pulp Fiction to Star Wars

Link from Pulp Fiction to The Lion King is Phil LaMarr
Link from The Lion King to Star Wars is James Earl Jones

Generating another few with random samples we see:

Finding link from The Veil to The Grand Budapest Hotel
Link from The Veil to Little Fockers is Jessica Alba
Link from Little Fockers to The Grand Budapest Hotel is Harvey Keitel
Finding link from Hamilton to Bernie the Dolphin
Link from Hamilton to Velvet Buzzsaw is Daveed Diggs
Link from Velvet Buzzsaw to Arkansas is John Malkovich
Link from Arkansas to Bernie the Dolphin is Patrick Muldoon
Finding link from Sabotage to RockNRolla
Link from Sabotage to Wrath of the Titans is Sam Worthington
Link from Wrath of the Titans to RockNRolla is Toby Kebbell
Finding link from Saw VI to Dallas Buyers Club
Link from Saw VI to Highway is Mark Rolston
Link from Highway to Dallas Buyers Club is Jared Leto
Finding link from Mulholland Drive to Joker
Link from Mulholland Drive to To Die For is Dan Hedaya
Link from To Die For to Joker is Joaquin Phoenix
Finding link from Rogue Warfare to The Time Machine
Link from Rogue Warfare to Warrior is Fernando Chien
Link from Warrior to Abduction is Denzel Whitaker
Link from Abduction to The Time Machine is Richard Cetrone

We can also search the graph for the “Longest Shortest Path”, aka the two connected movies which have the longest optimal linkage between them. It turns out that this is the link from Kishmish, a Bengali romantic drama, to Paskal a Malaysian action movie, at 13 links. The median longest shortest path is only 8 links though, illustrating how closely connected most films are.

Finding link from Kishmish to Paskal

Link from Kishmish to Tonic is Dev
Link from Tonic to Maya Kumari is Rajatabha Dutta
Link from Maya Kumari to Bela Shuru is Rituparna Sengupta
Link from Bela Shuru to is Soumitra Chatterjee
Link from to Mississippi Masala is Sharmila Tagore
Link from Mississippi Masala to Mimic is Charles S. Dutton
Link from Mimic to Stuber is Mira Sorvino
Link from Stuber to Headshot is Iko Uwais
Link from Headshot to Antoo Fighter is Bront Palarae
Link from Antoo Fighter to Balada Pencinta is Bell Ngasri
Link from Balada Pencinta to Ngorat is Iedil Putra
Link from Ngorat to Operasi X is Aaron Aziz
Link from Operasi X to Paskal is Hairul Azreen

Generating Actor Networks

Now that we have all the data we need, we can also create some fun visualizations, for instance visualizing the connections between all the movies from a particular actor, for instance the eccentric nature of an actor like Nicolas Cage. Note the lack of well-defined clusters, characteristic of an actor like Cage who is famous for taking diverse (and “eccentric”) roles.

Compare that to a network like that of Chris Evans, an actor who has made the bulk of their fame starring in the Marvel series of movies, but still has some outliers like “Not Another Teen Movie” that require more isolated links to connect to the rest of his filmography.

Or a network somewhere in the middle like that of Mike Myers, an actor who has fame from several very different types of movies, his adult oriented comedies, as well as his family friendly Shrek series.

It even allows for the illustration of odd movie fun facts, like the absolutely insane lack of connections between the highlander franchise.

Notebooks that were used in this post are available on github, and you can see some other thoughts on data by following me on twitter

--

--