Lowest Common Ancestor of N-ary Tree

Sahil Awasthi
8 min readJun 8, 2020

--

In this blog, I will try to explain the algorithm to find the Lowest Common Ancestor of a N-ary Tree.

The first question comes to mind is “What is Lowest Common Ancestor of a Tree”.

Let’s split the term and try to understand each word separately.

Tree is a data structure which is collection of nodes and each node is connected to other nodes. It is basically a graph which doesn’t have a cycle and only one connected component. Each node can have several children but only one parent and there is one special node which have no parent that node is referred to a root of that Tree.

The above tree is rooted at node 1.

Ancestor of a node are the nodes which comes in the path during the traversal from root to that node i.e. its parent, grandparent, great grand parent, … every parent till the root node.

In the above figure all ancestors of node 6 are {4,2,1} and for node 5 are {3,1}.

Common Ancestors of two nodes are those nodes which are present in the traversal of both nodes from themselves to the root node i.e. those ancestors which are common :)

Common Ancestors of nodes 6,7 are {4,2,1} and for nodes 3,7 is {1}.

Lowest Common Ancestor (LCA) that common ancestor which is closest to both the nodes or farthest from the root node.

LCA of nodes 6,7 is 4 and for nodes 4,5 is 1.

  • If one node is already ancestor of another node then there LCA will be that ancestor node only i.e. LCA for nodes 4,7 is 4 and for nodes 3,5 is 3
  • LCA of same node is same i.e. LCA of 6,6 will be 6.

Now the main question is “Given two nodes in the n-ary tree find the lowest common ancestor of those nodes?”.

Let first node be X and second node be Y, find LCA(X,Y)

First approach :-

We will make both the nodes to the same level,

level means distance from root node,

level[source] = level[parent]+1

for this we will travel upwards from the lower level node till both the nodes are at same level.

We have to find LCA(4,8) and Level(4) = 2, Level(8) = 3,

so we will travel distance = difference between levels of both nodes

i.e. we will travel 1 distance upwards from lower (8) node and reached to node 5,

Now both are at same level, now we will travel layer by layer upwards till the parents of both the nodes are not equal,

now parents of both nodes are equal and that will be the LCA, we will return the parent of any node, as both are same.

int LCA(int a,int b)
{
if(level[a] > level[b])
swap(a,b);
// if both are not at same level then move lower node upwards int d = level[b] - level[a]; // parent[i] stores the parent of node i
while(d > 0)
{
b = parent[b];
d--;
}
// base case if one was the ancestor of other node if(a == b)
return a;

while(parent[a] != parent[b])
{
a = parent[a];
b = parent[b];
}

return parent[a];
}

The time complexity of above algorithm is O(N) as we are traversing from both nodes towards the root node and in the worst case both nodes can be the leaf nodes and LCA of both can be root node itself.

Can we do better than this?

By doing some pre-computation?

Well yes!!

In the above approach, while making a jump towards root node, in each jump we are covering only 1 unit distance, can we cover larger distance in each jump?

Second Approach :-

BINARY LIFTING

The idea behind this approach is that, we store the information for every node that what is the parent which is at a distance 2^i upwards from the node

This 2^i means we are storing all the ancestors which are at a distance of [1,2,4,8,16….]

In the above figure for node 7

at 2⁰ distance -> node 4

at 2¹ distance -> node 2

at 2² distance -> null or (-1)

But why are are considering only distance in powers of 2? we can reach to any distance from current node as any distance can be represented in binary and we can traverse any distance if we have the above information for every node.

For e.g we need to find the ancestor of node 7 which is at a distance 3 from node 7 (the answer is node 1, but how?)

We will represent the distance in binary and we will make a jump from current node to its ancestor if and only if the i-th bit is set in the distance, and we will traverse from most significant bit to least significant bit.

3 is represented as 0011

the first most significant set bit is 1 and now we will make a jump from current node to the ancestor which is at a distance 2¹ away.

this process goes on till the every set bit is traversed.

For node 7 and to get an ancestor 3 distance away, we make a jump of distance 2¹ i.e 2 and we reach to node 2 and now we make a jump of distance 2⁰ i.e 1 and we reach to node 1 and as all the set bits are traversed we moved 3 distance from the original node.

Similarly for node 21 we need to find the ancestor which is 7 distance towards root node (which is node 1)

Binary representation of distance 7 :- 00111

i.e. 2² + 2¹ + 2⁰

we will always traverse from higher powers towards lower powers (from left to right)

from node 21 :- 2² distance = 4, we make a jump and reach to node 6

now from node 6 :- 2¹ distance = 2, we make a jump and reach to node 2

now from node 2 :- 2⁰ distance = 1, we make a jump and reach to node 1

node 1 is the 7 distance upwards from the original node 21.

For distance 11 its binary representation is 001011

so we make jumps at 2³ then at 2¹ and then at 2⁰.

This logic is very important, read the above lines until it is not clear.

We will store the parent of each node in the 2D array format, size of 2D array will be N*log(N) where N is the number of nodes in the TREE.

distance[B][0] = A

the above line means the node which is at a distance of 2⁰ from node B is A, simply its stores the immediate parent of the node B.

distance[B][1] will be ancestor which is at a distance of 2¹ from node B, i.e. parent of its parent.

distance[6][0] = 4

distance[6][1] = 2

distance[6][2] = -1 or null

distance[5][0] = 3

distance[5][1] = 1

calculating distance[any_node][0] is simple as it is the direct parent and we can find it easily by DFS.

now how to calculate remaining values??

Let node B is ‘x’ distance above node C and node A is ‘x’ distance above node B, where ‘x’ is a power of 2

  • distance(B,C) = x
  • distance(A,B) = x

=> distance(A,C) = 2*distance(B,C) = 2*x

// as we in the ith value we will store the node which is at a 2^i distance, so

let p = log2(x)

  • distance[C][p] = B
  • distance[B][p] = A

=> distance[C][p+1] = distance[ distance[C][p] ][p]

distance[6][0] = 4

distance[4][0] = 2

distance[6][1] =

distance[ distance[6][0] ][0] = distance[ 4 ][0] = 2

Try to make the 2D array for the below tree

so the 2D distance array of the above node will look like :-

Till now we know if we want to reach to an ancestor which is at a distance ‘d’ we can easily do that by jumping in powers of 2.

Will it be helpful in finding the LCA of two nodes?

Now it is hit and trial method to get the LCA

First step : we will check the depths of both the nodes, and if they are different we will make them in the same layer of depth, by traversing upwards only one node.

Let the nodes whose LCA we have to find be {4,8},

now from the root node(1)

depth(4) = 2

depth(8) = 3

now we have to make both the nodes to same depth, we will move the from the lower node towards root node at a distance of :-

depth(lower_node)-depth(upper_node)

as the difference is ‘1’ we will move 1 distance up from node 8 and reached to node 5

now the problem is LCA(4,5) ??

The Hit and Trial method !!!

We will check for the maximum distance (in powers of 2) where the LCA is different and if it is then we move both the nodes to that distance respectively and at last we will reach to the nodes which are one level down then the LCA so we simply return the parent of any of the nodes (as both will be same).

(Think why we are jumping where LCA is different instead of where LCA is same!!)

I know above paragraph is little crazy but try it yourself by creating some trees and try to go through each process of the below code and you will get the intuition.

int distance[N][log2(N)],level[N],distance[N];
// I have considered value of N be 10^5 which is most of the time
// so the log2(N) is taken as 20 for safer side :)
// dfs to store the nodes which are at ith distance
void dfs(vector<int> *graph,int source,int parent,int lvl)
{
distance[source][0] = parent;
level[source] = lvl;

for(int i=0;i<=18;i++)
distance[source][i+1] = distance[distance[source][i]][i];
for(int x : graph[source])
if(x != parent)
dfs(graph,x,source,lvl+1);
}// LCA of node u,vint LCA(int u,int v)
{
if(level[u] > level[v])
swap(u,v);

int difference = level[v] - level[u];
// moving difference units upwards from lower node i.e. v

for(int i=19;i>=0;i--)
if((difference>>i)&1) //checking the ith bit is set
v = distance[v][i];
// base case when one node was ancestor of other
if(u == v)
return u;
for(int i=19;i>=0;i--)
if(distance[u][i] != distance[v][i])
u = distance[u][i],
v = distance[v][i];

return distance[u][0];
}

The time complexity of above algorithm is O(log(N)) per query and O(N log N) pre-computation, which is far better than O(N) for every query.

Bonus

Find distance between two nodes in a tree :-

Distance(X,Y) = Level(X)+Level(Y)-2*Level( LCA(X,Y) )

Distance(6,7) = Level(6)+Level(7)-2*Level(LCA(6,7))

--

--