A Straightforward Method for Identifying Special Nodes on a Tree Path

6 min readOct 5, 2023

In this blog post, we’ll explore an intriguing method for identifying special nodes along the path between two nodes in a tree structure, especially when dealing with multiple queries.

Prerequisites

In order to grasp the solution, it is necessary to possess a foundational understanding of concepts related to tree data structures, as well as the notions of ancestors and least common ancestor.

Real world scenario

Imagine you have a computer network organized as a tree-like data structure, where connections are bidirectional and there are no cycles. Your goal is to transmit data from one computer to another within this network. However, there’s a catch: some computers may experience intermittent downtime before coming back online. Therefore, before initiating the data transfer, you want to determine whether it’s guaranteed that the data will reach its destination. Essentially, you want to ascertain whether there is any computer along the path from the sender to the receiver that is currently offline.

Abstract format of the problem

You’re dealing with a tree consisting of N nodes and you have to answer Q queries, each falling into one of two types:

Toggle Node State: You can change the state of a node, meaning a special node can become non-special, and vice versa.
Check Path Specialness: You want to determine whether the path between two given nodes, u and v, includes at least one special node.

Before exploring the solution of the problem we need to discuss one important technique first : The Euler tour

What’s an Euler Tour

An Euler Tour of a tree is a technique used to represent trees in an array format by assigning timestamps for entering and exiting nodes during a straightforward depth-first search (DFS) traversal.

Here, we use two arrays to record these timestamps:

tin[u]: Represents the time at which we enter the subtree rooted at node u.
tout[u]: Represents the time at which we exit the subtree rooted at node u.

The following illustration demonstrates an example tree’s Euler Tour:

Characteristics of the Euler Tour:

A node u is considered an ancestor of node v if and only if:

tin[u] is less than or equal to tin[v], indicating that we visit node u for the first time before we visit node v.
tout[u]is greater than or equal to tout[v], indicating that we exit node u after we exit node v.

This relationship ensures that node u is positioned higher in the tree hierarchy, and we can observe examples of this relationship in the picture with nodes 2 and 4.

2. When you have two nodes u and v, where u is an ancestor of v, you can determine if a node exists on the path from u to v by checking if the node appears only once in the subarray [tout[v],tout[u]] of the DFS order.

For instance, consider nodes u=1and v=5. If you examine the subarray [tout[5],tout[1]], you’ll find that nodes 6, 7, and 8 appear twice in this subarray, while nodes 5, 4, 2, and 1 appear only once. These nodes that appear once in the subarray are indeed the nodes that exist on the path from node 1 to node 5.

3. When you have two nodes, u and v, such that neither is an ancestor of the other, and assuming tin[u] < tin[v] (without loss of generality), then a node exists on the path from u to v if and only if it is either the lowest common ancestor (LCA) of u and v or it appears only once in the subarray [tout[u],tin[v]] of the DFS order.

For example, consider nodes u=4 and v=7. In this case, nodes 6 appear twice in the subarray [tout[4],tin[7]]. Nodes 4, 2, and 7 appear only once in this subarray. Additionally, the lowest common ancestor (LCA) of u and v is 1. Therefore, the nodes that exist on the path from 4 to 7 are {1, 4, 2, 7}, which aligns with the property.

Thinking process:

Now that we have a grasp of the Euler Tour concept, let’s explore its practical application in solving our specific problem.

Our first requirement is an algorithm that can efficiently determine whether a special node appears just once within a given path.
For each query we have, we are already aware of the specific interval along the Euler Tour that we need to examine.
Linking tin[u] and tout[u]: To check if eithertin[u] or tout[u] fall within the specified interval, we establish two separate data structures:

MinQuery Data Structure (Segment Tree): In this structure, we store tin[u] at the position corresponding to tout[u], while all other positions are initialized with a default value of positive infinity. When we perform a query on a range [l, r] and receive an answer that's less than l, it signifies the presence of a node with tout[u] within the range [l, r] and tin[u] less than l. This indicates that the node appears only once within the specified interval.
MaxQuery Data Structure (Segment Tree): In this structure, we save tout[u] at the position corresponding to tin[u], with all other positions initialized to a default value of negative infinity. When querying a range [l, r] and obtaining an answer greater than r, it implies the existence of a node with tin[u] within the range [l, r] and tout[u] greater than r. This, too, signifies that the node appears only once within the given interval.

By employing these specialized data structures and linking tin[u] and tout[u] within the context of the Euler Tour, we can efficiently determine the presence of special nodes along a path, providing a practical and effective solution to our problem.

Solution:

To begin, our initial step involves constructing an LCA (Lowest Common Ancestor) calculator utilizing any desired algorithm, such as Binary Lifting, Sparse Table, or a method of our choice.
Our next task is to establish the DFS (Depth-First Search) order and compute the tin and tout tables.

vector<int> adj[N];
int tin[N], tout[N], ti = 0;
void dfs(int u, int p = -1) {
    tin[u] = ti++;
    for (auto &v : adj[u]) {
        if (v == p) continue;
        dfs(v, u);
    }
    tout[u] = ti++;
}

Now, we proceed with the initialization of the MaxQuery and MinQuery data structures.

// initialize MaxQuery with 2*n  positions and -inf value
maxQueryDS = MaxQueryDS(2 * n, -inf);
// initialize MinQuery with 2*n  positions and +inf value
minQueryDS = MinQueryDS(2 * n, +inf);

Next, we create the function responsible for toggling the state of a node.

int state[N];  //  to save the state of each node
void toggleState(int u) {
    state[u] ^= 1;
    if (state[u]) {
        // marked the node as special
        maxQueryDS.upd(tin[u], tout[u]);
        minQueryDS.upd(tout[u], tin[u]);
    } else {
        // marked the node as non special
        maxQueryDS.upd(tin[u], -inf);
        minQueryDS.upd(tout[u], +inf);
    }
}

Following that, we implement the query function.

bool is_ancestor(int u, int v) {
    // check if u is ancestor of v
    return tin[u] <= tin[v] && tout[u] >= tout[v];
}

bool query(int u, int v) {
    int lca = getlca(u, v);
    // The LCA(u,v) is a special node  we return true directly
    if (state[lca]) return 1;
    // make u always have the lower tin
    if (tin[u] > tin[v]) swap(u, v);
    int l, r;
    if (is_ancestor(u, v)) {
        l = tout[v];
        r = tout[u];
    } else {
        l = tout[u];
        r = tin[v];
    }
    return (maxQueryDS.query(l, r) > r) || (minQueryDS.query(l, r) < l);
}

Complexity

Time Complexity:

The time complexity is given by O(PreprocessLCA + initDS + N + Q * (DSQuery + LCAQuery)), where:

PreprocessLCA: Complexity of preprocessing the LCA algorithm, which is N*log(N) for binary lifting.
initDS: Complexity for initializing the MaxQuery and MinQuery data structures.
N: Complexity of the DFS traversal.
DSQuery: Complexity of querying the MaxQuery and MinQuery, which is log(N) for the segment tree.
LCAQuery: Complexity for calculating the LCA, which is log(N) for Binary Lifting.

So, when using binary lifting with a segment tree, the overall complexity becomes O((N+Q)*log(N)).

Space Complexity:

The space complexity is given by O(LCAAlgorithm + DS + N), where:

LCAAlgorithm: Space required for the LCA algorithm, which is N*log(N) for binary lifting.
DS: Space for MinQuery and MaxQuery, which is 2*(4*N) if we use two segment trees.
N: Space for the tin[N], tout[N], and state[N] arrays.

Conclusion :

Alternative methods, such as Heavy-Light Decomposition (HLD), exist for addressing this problem. However, the approach discussed in this blog offers a solution that doesn’t demand advanced knowledge or intricate implementations.

Thank you for taking the time to read, and I trust that you’ve gained new insights from this content.