LCA implementation techniques revisited

Arghya Bhattacharya
5 min readJan 6, 2024

--

Lowest Common Ancestor (LCA) algorithms for a tree and/or graph is extremely important and popular in several applications, e.g., understanding a computer network topology, genealogy graphs, phylogenetic tree in evolutionary biology, social network analysis, and game theory.

From Wikipedia: In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself (so if v has a direct connection from w, w is the lowest common ancestor).

LCA(9, 11) = 2; Image courtesy: Amarjit’s blog

Let’s start from the easiest example. Given a Binary Search Tree (BST), find the LCA node of two given nodes in the BST (LeetCode #235 level: medium). First, define the structure of each node. Let’s begin with the case where the parent pointer is not available for each node. Hence, there is no way to walk upward in the tree.

struct TreeNode {
int val;
TreeNode *left, *right;
TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};

Because BST holds a property that each node is greater than all the nodes rooted under the left subtree and greater than all the nodes rooted under the right subtree, we recur until we find a node that is within the range of the given two nodes. If both the input nodes are less than the current node, we recur in the left subtree, otherwise, we recur in the right subtree.

TreeNode* lowestCommonAncestor
(TreeNode* root, TreeNode* p, TreeNode* q) {
if (p->val > q->val)
return lowestCommonAncestor (root, q, p);
if (root == NULL || root == p || root == q)
return root;
if (p->val < root->val && q->val > root->val)
return root;
if (p->val < root->val && q->val < root->val)
return lowestCommonAncestor (root->left, q, p);
if (p->val > root->val && q->val > root->val)
return lowestCommonAncestor (root->right, q, p);
return NULL;
}

Now, let’s consider answering the same query but for a given a binary tree (LeetCode #236 level: medium). Note that the only guarantee is that each node has up to two children. The node structure remains the same. First, we write an utility function is_descendant to check whether a given node is a descendant of another node or not (each node is a descendant of its own).

bool is_descendant (TreeNode* root, TreeNode* p) {
if (root == NULL)
return false;
else if (root == p)
return true;
return is_descendant(root->left, p)
|| is_descendant(root->right, p);
}

At each node, if one of the input nodes is a descendant of the left child and the other is a descendant of the right child, we return the node as the lowest common ancestor query response. On the other hand, if both the input nodes are descendants of either the left or the right child, we recur on that specific side of the tree (i.e., from the left or right child).

TreeNode* lowestCommonAncestor 
(TreeNode* root, TreeNode* p, TreeNode* q) {
if (root == NULL)
return NULL;
if (root == p || root == q)
return root;

if (root->left != NULL
&& is_descendant(root->left, p)
&& is_descendant(root->left, q))
return lowestCommonAncestor(root->left, p, q);
else if (root->right != NULL
&& is_descendant(root->right, p)
&& is_descendant(root->right, q))
return lowestCommonAncestor(root->right, p, q);
else if (is_descendant(root->right, p)
&& is_descendant(root->left, q))
return root;
else if (is_descendant(root->left, p)
&& is_descendant(root->right, q))
return root;
return NULL;
}

Now, the final tweak. In the previous algorithm, each query of is_descendant takes an O(n) time. The question is “can we do better?” Observe that, the whole requirement to know the head of tree will go away if we keep a pointer to the parent from each node. Let’s redefine the structure of each node.

class TreeNode {
public:
int val;
TreeNode *left, *right, *parent;
};

The way we determine LCA is in 3 steps:

Step 1: Calculate depths of the two given nodes. We do that by going up the tree from a given node until we reach the head. Note that the property of the parent node is that the parent of the parent node is NULL.

int get_depth(TreeNode* p) {
int count = 0;
while(p->parent != NULL) {
p = p->parent; count++;
}
return count;
}

Step 2: Adjust the depth of the two nodes by going up the tree from the node with higher depth so that they are brought to the same level.

if (depth_p > depth_q)
for (int i = 0; i < depth_p - depth_q; i++)
p = p->parent;
else if (depth_p < depth_q)
for (int i = 0; i < depth_q - depth_p; i++)
q = q->parent;

Step 3: Until both nodes are the same, we keep going up the tree level by level from both the nodes.

if (p->val == q->val)
return p;
while (p->val != q->val) {
p = p->parent;
q = q->parent;
}
return p;

Before wrapping it up for this blog, let me excite you a little bit by telling you why learning LCA is a good investment of time.

LCA was first formally proposed and defined by Aho, Hopcroft, and Ullman in 1973. Harel and Tarjan proposed an efficient data container for answering LCA queries optimally in 1984. Their algorithm has linear preprocessing time and constant query time. Schieber and Vishkin first simplified Harel and Tarjan’s algorithm in 1988, then went on to propose a completely novel way of determining LCA in 1993 without compromising on the asymptotic performance bounds. They could reduce LCA to RMQ (Range Minimum Query). RMQ has several applications in exact and approximate string matching.

Later, Michael Bender (my Ph.D. advisor) and Martin Farach-Colton simplified Schieber and Vishkin’s approach in 2000 (The LCA Problem Revisited). They also showed a simplified reduction of LCA to RMQ and derived the LCA solutions from RMQ solutions.

Bender et al. also showed LCA for DAG is a crucial problem for several complex systems in distributed computing (Lowest common ancestors in trees and directed acyclic graphs).

--

--