Tree Diameter — Why Does Two BFS Solution Work?

Tarek Badr
6 min readApr 8, 2020

In this article, I am going to analyze -in detail- how the tree diameter problem can be solved by two Breadth-first search (BFS) traversals. One reason why this solution is interesting is that even though it is intuitive, it is still uneasy to explain why exactly it works! Another reason is that understanding it deepens our grasp of proof by contradiction.

First, before proceeding to the solution, let me quickly review what is the tree diameter problem and the required concepts that need to be cleared.

A graph is a collection of vertices (also named as nodes) and edges that link the vertices.

Tree is a specific type of graph. The main characteristic of this type is that it has no cycles (which means that it is impossible to visit a node more than once in one path, this also means that there is at most one path between any two nodes).

Tree diameter is about finding the longest path in a tree. There are two popular linear time solutions for this problem; the first one is based on dynamic programming (more details), and the second one is based on breadth-first search algorithm (BFS). BFS is a graph traversal algorithm that works by traversing the graph in a level by level manner. Observe the order at which every node is visited here:

By Alexander Drichel — Own work, CC

Tree diameter BFS algorithm

This algorithm utilizes BFS as follows:

  • Pick any random node (let’s call it X).
  • Make a BFS traversal from this node.
  • Pick the farthest node from X (let’s call this node A).
  • Run BFS again but from node A.
  • Pick the farthest node from A (let’s call this node B).
  • Return A and B as the endpoints of the tree diameter.
Two BFS solution steps

I bet you feel that the algorithm sounds intuitive, but unfortunately, it is not obvious why it works. Before explaining why, let’s state what we will base our understanding on.

  1. Given that A and B are the endpoints of the diameter, then we also know that there is only one path between A and B, this is due to the characteristics of the tree graph we discussed earlier.
  2. For simplification, we are going to assume that there is only one tree diameter in the graph, this allows us to concentrate on the main problem, rather than shifting our focus on corner cases (extending it to include multiple same-size diameters isn’t challenging though).
  3. If we have one endpoint (A or B), then it should be fairly easy to reach the other endpoint. We just need to find the longest path from this endpoint (run BFS) which should deliver us to the other endpoint. Why this is true? because otherwise, we will have a longer path (i.e. longer than A to B, the tree diameter) which is a contradiction.
If we start from A, we shouldn’t find a longer path than “A to B“ (the diameter).

From the last point, we can then reduce our problem to:

Find any of the endpoints of the diameter.

To solve this reduced problem, we need to prove that using BFS from any node is enough to find any of the diameter endpoints. To examine this approach we need to look at two scenarios:

Scenario 1: the starting node X is inside the diameter path (between A and B).

In this scenario, if we run BFS from this node X, what do you think the farthest node will be?

The farthest node has to be one of the diameter endpoints because we can look at this situation as if we split the tree diameter into two paths, A to X and X to B (as shown in the example below).

and if the farthest node wasn’t a diameter endpoint (say A*),

then we were able to extend the path “ X to A ” to “ X to A* ”, which means that we are able to find a longer diameter “ B to A* ” (which is a contradiction).

Scenario 2: the starting node X is outside the diameter path (between A and B).

This scenario has two interesting cases:

Case 1: The longest path from X touches the tree diameter.

Well, the moment the longest path touches the tree diameter (at node T), we can repeat scenario 1 by assuming that our starting point will then be T. Afterwards, the longest node from X should satisfy this equation:

Based on our discussion in scenario 1, the farthest node from T is going to be an endpoint of the diameter.

Case 2: The longest path from X doesn’t touch the tree diameter.

Here X is the starting node, and Y is the farthest node from X.

This means we have two long non-intersecting paths in the tree, one is the tree diameter “A to B” and another is “X to Y”.

Here we introduce two new nodes i and j, these are the nodes that are going to eventually connect both paths (there have to be such nodes because the tree has to be connected).

In this graph, let’s assume that the distance from X to j is ≤ the distance from j to Y. We can safely assume that because otherwise we can swap the namings of X and Y to make it work. Also, since it doesn’t matter which diameter endpoint is farthest from node i, let’s assume it will be node A.

So far, by looking at the above graph, we can reach some conclusions:

  • The farthest node from node i is now A (based on scenario 1).
  • The farthest node from node j is Y. Otherwise, X won’t pick Y as its farthest node.

Since “j to Y” > “j to A”, then node i could take more steps to reach Y than A, because it can move to j and we know already that the farthest node from j is Y. This leads us to a contradiction because the farthest node from i should be A not Y.

I hope the following images illustrate this contradiction:

I hope this analysis was sufficient to provide good enough informal proof that convince you why the two-BFS solution work. Please let me know in the comments your thoughts and if there are any unclear parts.

--

--