Published in

Analytics Vidhya

# Do All Hollywood Actors Know Each Other? Breadth-First Search in “Action”

The task is straightforward: given two actor names, find the connection between them through their movie casts. Such as Emma Watson knows Jennifer Lawrence through Daniel Radcliffe (Harry Potter), who knows James McAvoy (Victor Frankenstein), who finally knows Jennifer Lawrence (Dark Phoenix). GitHub.

`\$ python degrees.py largeLoading data...Data loaded.Name: Emma WatsonName: Jennifer Lawrence3 degrees of separation.1: Emma Watson and Daniel Radcliffe starred in Harry Potter and the Chamber of Secrets2: Daniel Radcliffe and James McAvoy starred in Victor Frankenstein3: James McAvoy and Jennifer Lawrence starred in Dark Phoenix`

The task is about solving a search problem, such as a maze, in which we are given an initial state (Emma Watson) and a goal state (Jennifer Lawrence). We need to find the shortest path between the initial state and the goal state, which means that we ought to take advantage of Breadth-First Search that is based on using a queue-frontier. By using a queue-frontier, we can gradually check all the available neighboring nodes from the initial state, meaning that we will always arrive at the shortest path because we will exhaust all states until we reach the shortest solution.

First, we need to see what data is available to us. The data is provided by IMDb and includes three tables: movies.csv with the list of titles, years, and ids; stars.csv with the list of person_ids and their movie_ids; and people.csv with the list of ids, names, and birth years.

Second, we need to establish what the node will store. In our case, the node class is an object with the `state`, `parent`, and `action` attributes.

`class Node():    def __init__(self, state, parent, action):        self.state = state    # the current actors id        self.parent = parent  # the previous actors id        self.action = action  # the current movie id`

Storing `parent` would prove really handy once we reach the final state and need to trace the path that we did. See it as a Singly Linked List where we have the pointer pointing at the next node (though, in our case, it points to the previous node).

So when it comes to Breadth-First Search, we should use a queue to store all nodes that we need to visit. Using a queue is essential to BFS as we need to traverse all nodes on the same level sequentially to find the shortest path. There is a nice medium article on BFS.

Having said that, now we can begin our `shortest_path `function that takes source_id and target_id for the attributes. We first initialize the `frontier`, then add the first node that stores `source_id` for the state and `None` for the parent and action. It is then important to keep the `explored` set in order to avoid checking the same people and their movies.

`frontier = QueueFrontier()frontier.add(Node(state=source_id, parent=None, action=None))explored = set()`

We then traverse until we either find the solution or exhaust all possible nodes from the frontier. If the frontier is empty, we can say there is no connection. If it is not empty, we extract the node from the queue, specifically the one we put there first. If the node’s state is equal to the `target_id`, it means that we have reached the solution, and we need to determine the path we did. This is where the node's parent comes in handy (yay!). We traverse back using the node and save all the nodes that we find in the `path`. Once we put all the nodes in the list, we reverse it, and return it.

`while True:    if frontier.empty():  # No more available nodes to check against target_id        raise Exception("No solution")    node = frontier.remove()  # Take out the first element from the queue    if node.state == target_id:  # We found the final state!        path = []        while node.parent is not None:            path.append((node.action, node.state))            node = node.parent        # Don't forget to reverse it because we were moving from the final state to the initial        path.reverse()        return path`

It is not over yet. Bear with me. If the node’s state is not equal to the `target_id`, then we add it to the explored set and add all of its neighbors to the frontier.

`explored.add((node.action, node.state))  # Record that we've already seen this actor and moviefor action, state in neighbors_for_person(node.state):    if not frontier.contains_state(state) and (action, state) not in explored:        # New node for the next actor and movie        child = Node(state=state, parent=node, action=action)        frontier.add(child)  # By adding to the frontier, we put it in the end of the queue`

You might be wondering what is the `neighbors_for_person `function doing? Well, it simply returns all the neighboring actors that starred in the same movie. For example, given Emma Watson as the initial state, we extract all the ids of her movies (all Harry Potter movies probably) and then go over them and the people who took part in those movies as well. We finally return the set of all those people's ids and the movie ids.

`movie_ids = people[person_id]["movies"]neighbors = set()for movie_id in movie_ids:    for person_id in movies[movie_id]["stars"]:        neighbors.add((movie_id, person_id))return neighbors`

The algorithm, in general, is certainly quite slow. The `shortest_path` function has a runtime complexity of around O(n⁴) which is terribly bad, but the point of the project was rather about applying BFS in real-world data. I hope you enjoyed reading this post. Please note that this project is a part of my coursework towards Harvard’s CS50 Artificial Intelligence; you can find more about it here. Please see my GitHub repository as well.

Another example:

`python degrees.py largeLoading data...Data loaded.Name: Eddie MurphyName: Bill Clinton3 degrees of separation.1: Eddie Murphy and Halle Berry starred in Boomerang2: Halle Berry and Robert Downey Jr. starred in Gothika3: Robert Downey Jr. and Bill Clinton starred in The Last Party`

--

--

## More from Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

## Damir Temir

25 Followers

Data Science Aspirant w/ interest for writing. Student at the University of Illinois. Born and raised in Kazakhstan. temir.dev