Sparse Table
Hello everyone,
I would like to tell you about a data structure that, I believe, deserves to be widely known in the CP community. You won’t find these data structures in your standard library, so we will need to implement these by ourselves. We are going to solve two very useful problems in CP with sparse table: The lowest common ancestor (LCA) problem and the range minimum query (RMQ) problem.
Sparse Table is a data structure, that allows answering static range queries. It can answer most range queries in 0(log N), but it efficiently answers range minimum queries (or equivalent range maximum queries) in 0(1) time. It does preprocessing beforehand so that the queries can be answered efficiently.
Let, you have an array “arr” and you want to perform some queries. Each query should compute function F over subarray [L, R]: F(arr[L], arr[L + 1], …, arr[R]). With sparse table, you can do each query in O(log(N)) (N is the size of arr), with initial O(N * log(N)) preprocessing. It often serves as a substitute for segment tree in case of immutable data.
Restrictions
Sparse table can be used if and only if:
- Array is immutable (i.e. queries do not change it)
- Function F is associative :- F(a, b, c) = F(F(a, b), c) = F(a, F(b, c)).
Understanding Sparse Table
Imagine you are watching a movie and your friend assures you, nothing is interesting between 00:10 and 01:01 (hh: mm). Also, he tells you the story from that period. Now you don’t need to watch it entirely and you’ve just saved your precious time. Now imagine yourself in a situation where you want badly, to watch the movie Spider-Man: No Way Home. But due to a busy schedule, you can’t make it to the movie theater. And let suppose you have many friends who all have watched some parts of the movie, and when combined, they’ve seen the whole movie. Now you don’t have to go to the theater, provided that your friends are eloquent enough. You just simply ask them about the movie, and they narrate it to you, piece by piece.
So similar to this Sparse Tables helps you to save a lot of time, it usually stores data in chunks and answers efficiently when needed.
Ideology
We know any number can be represented in binary as a sum of distinct powers of two.
E.g. 19=(10011)2=16+2+1.
With same idea any interval can be uniquely represented as a union of intervals with lengths as distinct powers of two.
E.g. [2,14]= [2,9] ∪ [10,13] ∪ [14,14],
Here, the complete interval has length 13, and the individual intervals have the lengths 8, 4 and 1 respectively. And also here the union consists of at most log2(R-L+1) many intervals.
To compute the value of some interval of length 2^k, it is simply two of the intervals of size 2^(k−1) joined together, so every interval can be computed in constant time.
The main idea behind Sparse Tables is to precompute all answers for range queries with power of length two. Afterwards, a different range query can be answered by splitting the range into ranges with power of two lengths, looking up the precomputed answers, and combining them to receive a complete answer.
Building a Sparse Table
A sparse table can be built in a bottom-up manner using dynamic programming. The key idea is to precompute answers of all sub-arrays of size 2^j (where j varies from 0 to Log n). We will use a 2-D array for storing the answers of the precomputed queries, table[i][j] will store the answer for the range [i,i+(1<<j)−1] of length 2^j. The size of the 2-D array will be N×(K+1), where N is the array length and K = ceil(log2(N)).
Lets take an example of Range Sum query to understand the building of Sparse Table.
Applications
Range Minimum Queries (RMQ)
These are the queries where the Sparse Table shines. Range minimum query solves the problem of finding the minimum value in a range of an array .
Algorithm
When computing the minimum of a range, it doesn’t matter if we process a value in the range once or twice. Therefore instead of splitting a range into multiple ranges, we can also split the range into only two overlapping ranges with power of two length. This way we can efficiently answer RMQ in 0(1) time complexity.
So, we can compute the minimum of the range [L,R] with:
Here ,in given figure we are supposed to find minimum element in range of [0,8] . Here, the length of range is log2(8–0+1)~3, therefore new range will be divided as min([0,8])=min( min([0,7]),min([5,8]) ) which comes out to be from precomputed 2-D sparse table is min([0,8])=min(2,1)=1. Hence minimum of full range [0,8] is 1 which can be verified manually also.
C++ Implementation
Time Complexity
for RMQ: 0(1)
Lowest Common Ancestor
Let G be a tree. For every query of the form LCA(u, v)
we are supposed to find the lowest common ancestor of the nodes u
and v
, i.e. we want to find a node w
that lies on the path from u
to the root node, that lies on the path from v
to the root node, and if there are multiple nodes we pick the one that is farthest away from the root node
Algorithm
For each node, we will precompute its ancestor above him, its ancestor two nodes above, its ancestor four above, its eight ancestor above it etc, and store it.
We will use 2D-array let’s say up, up[i][j] stores the 2^j-th ancestor of node where i=1...N
, j=0...ceil(log(N))
.
We can compute this array using tree traversal techniques, preferably DFS.
For each node, we will also remember the time when this node is encountered very first time(“in time”), and the time when we left it (i.e. after we visited all its node in sub-tree and exit the DFS function)(“out time”). We can use this information to determine if a node is an ancestor of another node in constant time.
- A node u will be ancestor of node v if and only if its ‘in’ time is less than or equal to ‘in’ time of node v and ‘out’ time of node u is greater than or equal to the ‘out’ time of node v.
We firstly check whether a node from two is an ancestor of other or not and if one node is ancestor of other then it is the LCA of these two nodes otherwise we find a node which is not the common ancestor of both u and v and is highest(i.e. a node x such that x is not the common ancestor of u and v but up[x][0] is) in the tree. After finding such a node (let it be x), we print the first ancestor of x i.e. up[x][0] which will be the required LCA.
* While building the sparse table we will use
up[i][j]= up[up[i][j-1]][j-1];
Here, in the given figure, we can see that we will check for node 14 & node 15. From node 15 we will firstly take a jump of 4 units and check if node 3 is an ancestor of node 14 or not, yes it is. We will check in decreasing order of power of 2’s. Then we will take a small jump of 2 units and check if node 7 is an ancestor of node 14 or not, no it is not. Then from node 7, we will take a jump of several distinct power of 2’s. But every time it will be landing on a node that is an ancestor of node 14. Then we will conclude that node 7 is the highest node that is not the ancestor of both node 14 and node 15. And then we will conclude that 2⁰(=1)th ancestor of node 7 is the lowest common ancestor of node 14 and node 15. Therefore node 4 is the LCA of node 14 and node 15 in the above figure.
Time Complexity
0(NlogN) for pre-processing0(logN) for each query.
C++ Implementation
Practice Problems
Further Resources
Thanks for Reading
This is my first blog, do give your views in the comment below.
About me
I am Aditya Biswakarma, 2nd-year B.Tech IT student at IIIT Allahabad, Member at Competitive Coding Wing, Geekhaven, IIIT Allahabad. Connect on Linkedin.