Understanding the Depth-First Search and the Topological Sort with Python

Yasufumi TANIGUCHI
Feb 18 · 8 min read

There are two fundamental ways of graph search, which are the breadth-first search (BFS) and the depth-first search (DFS). In this post, I’ll explain the depth-first search. Here, I focus on the relation between the depth-first search and a topological sort. A topological sort is deeply related to dynamic programming which you should know when you tackle competitive programming. For its implementation, I used Python. If you’d like to know the breadth-first search, check my other post: Understanding the Breadth-First Search with Python.

1. The algorithm of the depth-first search

1. Apply step 2 to the starting vertex
2. Re-apply step 2 to the neighbor of the given vertex if the vertex is not visited

Step 2 is the most important step in the depth-first search. Basically, it repeatedly visits the neighbor of the given vertex. Note that it visits the not visited vertex. This is because the program has never ended when re-visiting.

Let’s check the way how that algorithm works. On the figure below, we start the depth-first search from vertex A. Initial state will become as follows.

First, we visit vertex B, the neighbor of vertex A, and mark B visited. Note that we visit vertices in alphabetical order if the visiting vertex has multiple neighbors. So we don’t visit vertex G here.

Then we visit the neighbor of vertex B, vertex C, and mark it visited.

We try to visit the vertex C’s neighbor A, but it’s already visited. So we visit vertex D and mark it visited.

We visit the neighbor of vertex D, vertex E and mark vertex E visited.

Vertex E doesn’t have neighbors, so we reached a dead-end. In the depth-first search, we visit the not visited neighbors of the visited vertex before reaching a dead-end. In this case, we visited vertex D before reaching the dead-end, so we visit the not visited neighbor F of the vertex D.

In the depth-first search, we first visit the vertices in one path and its neighbors, then visit vertices in another path. For example, A -> B and A -> G are two paths. We first visit all vertices in the first path, A -> B -> C -> D -> E -> F. And then visit the second path A -> G. We’ve already visited A, so just add G to the visited hash-table.

Here we visited all the vertices, so we get the depth-first search done.

2. Edge classification in the depth-first search

From the figure above, we got the characteristic tree structure. In the depth-first search, we classify the edges into four kinds below.

  • tree edge
  • forward edge
  • back edge
  • cross edge

I’ll describe these edges in details.

Tree edges are the edges included in the path of the depth-first search. So all the edges in the earlier figure are tree edges

Rest of three edges are the edges not included in the path of the depth-first search but contained in the graph. First, forward edge is the edge from vertex C to vertex F. In the tree structure of the depth-first search, this edge goes from some node to the descendant node, so we call it the forward edge.

Next, I’ll explain back edges. The back edge is the edge from vertex C to vertex A. In the tree structure of the depth-first search, this edge goes from some node to the parent node, so we call it the back edge.

Finally, I’ll explain cross edges. The cross edge is the edge from vertex G to vertex C. In the tree structure of the depth-first search, this edge goes from some node in a sub-tree to the other node in other sub-tree, so we call it the cross edge.

That’s all for edge classification.

3. Relation to a topological sort

In a topological sort, we sort the vertices to make all edges go from left to right. Look at the graph below.

We get the figure below when we apply a topological sort to this graph. I’ll describe the way how to apply later. Also, I show the edges of the original graph above in the figure below. From the figure below, you can see all the edges go from left to right.

We call the node order above topological order. Let’s think what it tells us. Assume each vertex to be cooking steps for curry. Broadly speaking, the steps consist of cutting ingredients, seasoning meat, stir-frying them, and stewing. Here cutting ingredients or seasoning meat correspond to the vertices towards the left and stir-frying or stewing correspond to the vertices towards the right. This is because we cannot stew before cutting ingredients. So we can sort vertices in dependency order by using a topological sort. In the figure above, we put less dependent vertices from left to right.

On the other hand, we cannot use a topological sort to the following graph or the graph which has the part of the structure below. This is because all the vertices depend on the other vertices.

We call the path going around the vertices like this cycle. So we cannot use a topological sort to the graph has a cycle. We call the graph without any cycle directed acyclic graph, also known as DAG. The important thing is that if the graph can be topological-sorted, it is a DAG and DAG can be topological sorted.

We can get a topological order by applying the depth-first search to DAG. Look at the following DAG. This graph is the same as the example of a topological sort.

We can show the path of the depth-first search on this graph as this following tree structure.

We put each node in the tree structure from left to right by the order last visited in the depth-first search:

Then we reverse the order and add the edges in the original graph. We get the following figure. You can see this is a topological order.

Now we know we can get a topological order by applying the depth-first search to a DAG.

4. Implementation

def dfs(graph, vertex):
parents = {vertex: None}
dfs_visit(graph, vertex, parents)
def dfs_visit(graph, vertex, parents):
for n in graph[vertex]:
if n not in parents:
parents[n] = vertex
dfs_visit(graph, n, parents)

We’re initializing to start the depth-first search in this line.

parents = {vertext: None}

We manage the vertices if they are visited by using keys in the parents. Also, values in parents correspond to the visited vertices before visiting the vertex of the key in parents. We call this variable “parents” because it manages the parent node in the tree structure of the depth-first search. It means parents[key] returns the parent node (some vertex in a graph) of the key node (also another vertex in a graph).

dfs_visit is the main process of the depth-first search.

def dfs_visit(graph, vertex, parents):
for n in graph[vertex]:
if n not in parents
parents[n] = vertex
dfs_visit(graph, n, parents)

In this operation, we extract the neighbors of the given vertex and apply dfs_visit to the extracted vertex if it’s not marked visited. Note that graph[vertex] returns the neighbors of the vertex. When the recursively called dfs_visit is done, it gets back to the for-loop and applies the same operation to the other neighbors. After the first for-loop is finished, we can reconstruct the path of the depth-first search from parents.

Let’s think about the time complexity of the depth-first search. Its time complexity will be the called number of dfs_visit because the other operations take constant time. We call dfs_visit in the number of neighbors of the vertex times inside dfs_visit. In other words, we call dfs_visit in the number of degree of the vertex inside dfs_visit. In the worst case, we should call dfs_visit in the number of all the vertices times. So the time complexity will be the sum of the number of all the vertices and the number of the degrees of each vertex. The number of the degrees of each vertex will become 2|E| by the handshaking lemma. Therefore, the depth-first search run in O(|V|+|E|). Note that V is a set of vertices and E is a set of edges. I explain the handshaking lemma in my other post: Understanding the Breadth-First Search with Python. So if you are not familiar with it, please check it out.

That’s all for the explanation for the relationship between the depth-first search and the topological sort. Topological order deeply relates to dynamic programming. So I recommend you to keep it in mind if you’re going to tackle competitive programming. Thank you for reading.

References

Thanks to Xu LIANG.

Yasufumi TANIGUCHI

Written by

Software engineer, My interest in Natural Language Processing