Understanding Range Queries and Updates: Segment Tree, Lazy Propagation and MO’s Algorithm

Published in

Nybles

14 min readSep 30, 2020

In this article I am going to discuss one of the most frequently asked topics in competitive programming, Range queries and Updates. Often, we encounter such a problem that we need to answer some queries over segments or intervals.

Example:

Given an array of integers and two integers l and r. In that range, we have to find,

The Smallest element
Number of distinct elements
Max length of increasing subarray
Sum of elements etc.

Sometimes we are given some sort of updates either updating a single element or updating an interval. Here, I have tried to discuss a few general approaches (Segment tree, Lazy propagation and MO’s algorithm) to those problems. First, I will introduce that approach and then implement it for a particular problem.

So, without any further discussion, let’s get started.

SEGMENT TREE:

The very first approach in this journey of answering queries is segment tree. It is one of the most powerful tree data structure which enables us answering queries over an array successfully and also allows to modify the array at a single point or over a given segment. It is basically a binary tree which stores some segments.

Let’s consider an array A of length N and a segment tree T built over it

Root node of T will represent the whole array A [1, 2, …., N].
Each leaf node of T will represent a single array element A[i], 1<= i <=N.
Each internal node in T is responsible for a segment like A [l, l+1, l+2, …., r] 1 <= l, r <= N.

Structure of a general Segment Tree

Root node represents the whole array A [1: N]. We will break the array into two segments first A[1: (N+1)/2] and other A[ (N+1)/2+1 : N]. As it is a binary so each node has two children. First child of root will represent the segment A[1: (N+1)/2] and the second will represent the segment A[(N+1)/2+1:N]. So, in each step we will split the segment into two halves and its two children will represent those halves. We will keep splitting the segments till one node represents one array element.

Structure of a Segment tree for an array of length 6.

Height of the Tree

Max height of the segment tree is ceil(log2(N)).

Size of Segment Tree (Space Complexity)

Total number of leaf nodes is at least N.

Total number of internal nodes = N-1.

So, total number of nodes = 2*N-1. But generally, to build a segment tree over an array of size N, we use a tree of 4*N nodes. (Why?)

Array representation of above segment tree will be

Tree[]={1,2,3,4,5,6,7,8,9,dummy,dummy,10,11,dummy,dummy};

Since we use the array representation of the segment tree, so a bit space is wasted as you can see in above example. These waste space can be removed but in that case its implementation would be more complex. Considering these waste spaces, for smooth working of data structure, a tree of size 4*N is used, which is still of linear space complexity.

4*N size is enough for an array of size N. Why?

A segment tree has only three operations

Building Tree : Initializing the segment tree variable and build its structure.
Updating Tree : Updating the value at a point or over an interval in the array and change tree accordingly.
Querying Tree : Answering accordingly.

For understanding this let’s consider a standard question: Range minimum query

Given an array of integers of length N, there are q queries of two types.
1. 0 l r - Find the smallest element in the sub-array A[l : r]. 
2. 1 a b - Update the element as A[a] = b.1≤N,q≤ 100000;  1≤l,r,a≤N;  1≤A[i],b≤ 1000000000.

Naive approach: For 1st type of queries, iterate from l to r and find the smallest element. For 2nd type of queries, update array element as A[a]=b. Here, query_1 is of complexity O(N) while query_2 is of complexity O(1). Clearly, this approach will not work if there are too many queries of type 1.

Segment tree approach: Firstly, as the segment tree is nothing but a binary tree, so we will follow its array implementation. For building the tree we must think what are we going to store in the nodes, sticking to this problem our concern is to find the minimum over a given segment, so at each node, representing the segment A [l, l+1, l+2,…, r], we are going to store the smallest element in that range.

Question is how to do that?

Build the Tree:

We will simply follow the bottom up approach (recursion backtracking) to build the tree. Each leaf node will be assigned one of the array elements and each internal node will store the smallest element between its two children.

In the above function, node represents the current node of the tree being processed and variables st and end represent the range (segment) of that current node. Recall that segment tree is a binary tree, left child, given by 2*node, contains the range A[st : mid] and right child, which is given by 2*node+1, contains the range A [mid+1: end] where mid=(st+end)/2.

Since build function is being called once for each node (total maximum 4*N node), so the time complexity of the build() is O(N).

Example, let’s consider A[]={3,2,1,6,8,10}. Segment tree will look like…

Initial segment tree values and structure for the array A[] = { 3, 2, 1, 6 ,10}

Update the tree:

Now, we want to modify the array element as A[a] = b and hence we have to modify the segment tree. Looking at the tree, one can clearly observe that index to be updated lies only in one segment of each level so we need to update only log(N) nodes. Starting from the root node we will recursively call the update function for the node which contains required index till we reach the leaf node representing that index. Then we will update that node, and in backtracking all the ancestors till root will be updated as minimum of its children.

Code for the same is:

TIme complexity of update function is O(log(N)).

P.S. Often we don’t need to write one special build() function for building the segment tree. It can be done using the update function as well. For that we need to initialize the tree nodes with such a value that will never affect our answer.

As in this case, we should initialize the tree nodes to infinity (INT_MAX) because min(INT_MAX, A[i]) will always be A[i]. After initialization, pass the index i and A[i] to the update function for each 1<=i<=N to build the tree.

Answering queries of type 1:

We are given two integers l and r, we need to find the smallest element in this range (l and r inclusive).

To accomplish the task, we will traverse the segment tree and use the pre-computed minima of the intervals. Let’s consider that currently we are at a vertex which covers the interval A[st, st+1, st+2, . . ., end]. Then there are three possibilities:

Range represented by the node lies completely inside the given query range: In this case, we will just return the value stored at that node in the segment tree.
Range represented by the node lies completely outside the given query range: In this case, query function should return such a value that will not affect our answer. As in this case we want to find the minimum element so our function should return infinity (INT_MAX).
Range represented by the node lies partially inside and partially outside the given query range: As in this case, our answer depends on both the children so we will make two recursive calls one for each child. Let’s say partial answer from left child is ans1 and that from right child is ans2, so our answer would be min(ans1, ans2).

Let’s see the implementation:

Time Complexity: Since, at any level at most 4 nodes will be visited and the total number of levels in the tree is log(N). The upper bound of the all the visited nodes would be 4*log(N). Hence, the time complexity of each query would be O(log(N).

Hence, using this approach we can solve this problem efficiently. Update and query both are done in O(log(N)). The total time complexity of the solution will be O(N+Q*log(N)).

Let’s see how can we use this approach to solve the similar problems.

Range sum query:

Given an array of integers A. There are q queries of two types. In 1st type of queries, we want to find the sum of the elements in a given range, say l to r. In 2nd type of queries, we have to update an array element as A[a]=b.Query-1:  0 l r.   1<=l,r<=N.Query-2:  1 a b. 1<=a<=N.
Constraints are same as previous range minimum query problem.

The most important thing is what are we going to store at the nodes? As we need to find the sum in the range l to r. So, at each node we would store the sum of elements represented by that node. Only one thing we need to change in update() function is that each internal node will store the sum of its children. For answering sums, we will follow the same strategy used in previous problem.

Solution for this problem can be found here.

Practice a similar problem here and solution can be found here.

Finding LCM or GCD

In this problem we want to find the LCM (lowest common multiple) or GCD( greatest common divisor) of numbers in a given range of array with the point updates.

This is an interesting variation of Range minimum query or Range sum problem. Here, at each node we should store the GCD or LCM of the array elements contained in subtree of that node. It can be done by storing the GCD (similarly LCM) of the children at an internal node.

Solution for this problem can be found here.

Some practice problems:

Sorted subarray | Practice Problems

You are given an array $$Arr$$ of $$N$$ integers. Two types of queries will be asked to you. Type 1- $$1$$ $$idx$$…

www.hackerearth.com

Contest Page | CodeChef

CodeChef - A Platform for Aspiring Programmers CodeChef was created as a platform to help programmers make it big in…

www.codechef.com

LAZY PROPAGATION:

So far, we have seen point updates, but what if the updates are also given in the range like add or subtract a to all the elements in the range l to r. One of its solution would be to update all the elements of the range one by one but its complexity would be O(N*log(N)). So, to handle such situations here comes Lazy Propagation.

Lazy propagation is range update optimization implemented on Segment tree that performs range updates in O(log(N)) time.

Consider the range sum problem. Given an array of integers and q queries of two types. In update query, we need to add x to all the elements in the range l to r. In second type of query, we need to find the sum of the elements in a given range say l to r.

As I already mentioned one of its solution would be to update all the elements in the range l to r but each update query to segment tree will cost O(N*log(N)) time, so it will not work effectively.

Being Lazy is GOOD (sometimes 😜)

The idea behind the algorithm is don’t not update a node until needed which will avoid the repeated sharing.

Updating the interval represented by a node is similar to update the subtree of that node in segment tree. Suppose we have to update subtree of node x, instead of updating all the values at that instant we will only update that node and mark its children that their subtree has to be updated and for maintaining this record we will use Lazy [] array. Size of Lazy array would be same as the size of segment tree. We will initialize the Lazy[] with 0 which indicates there is no pending updates. While processing a node, let say u, if Lazy[u] is non-zero, it implies there is a pending update so before doing any further operation we will first update node u and if its children exist make them lazy.

While updating a given range following cases may arise:

If the segment represented by a node is completely outside the given query range then terminate the function call.
If the segment represented by a node lies completely inside the query range then update the current node and if its children exist mark them lazy.
If the segment represented by a node lies partially inside and partially outside the given query range, call the update function for both the children and change the value stored at that node by sum of value stored at its children.

Let’s see the code:

Since we have changed the update function and the tree has become lazy, so we need to change the query function as well. The only change we have to do is check if there is any pending update on the current node. If found, then update the value stored at that node and if its children exist then make them lazy. Rest is same as previous query function.

Practice problem:

Problem - C - Codeforces

In mathematical terms, the sequence of Fibonacci numbers is defined by the recurrence relation DZY loves Fibonacci…

codeforces.com

CONCLUSION:

Segment tree and Lazy propagation can be used solve a variety of problems. It provides update and query both in O(log(N)) time, which makes it faster as compared to other approaches.

I found a nice collection of problems on Segment Tree and Lazy Propagation here and here. Practice as much as you can.

MO’s ALGORITHM (Query square root decomposition technique):

This is a powerful technique which can be used to solve a variety of range query problems like Range sum query, Range minimum query, number of distinct elements in the range etc.

The question is why to use this if we can solve Range sum, RMQ (Range minimum query) using a segment tree. But look at the third example, finding the number of distinct elements if you know the answers of segments [l: mid] and [mid+1: r] it is a bit tough to merge those answers to get answer for [l: r]. In those cases, MO’s algorithm will help us.

Let’s see the problem statement clearly

There is an array of N integers and Q queries. Each query is a pair of integers {l, r}. For each query you have to return the number of distinct elements in the range l to r.
1<=l<=r<=N
1<=N, Q<= 100000.
1<= A[i] <=1000000.

Naive Approach: In each query iterate from l to r and insert each element in a set. Size of set would be the answer to each query. Each query will cost O(N) time and hence solution will take O(N*Q) time, which is not going to get AC.

Efficient Approach:

Before applying MO’s algorithm, Let’s learn how it works…

We can observe the following things:

There is overlapping of the segments.
Answers for the ranges [l+1, r], [l, r+1], [l-1, r] and [l, r-1] can be computed from the answer of range [l, r].

Since all the queries are known beforehand, so what can we do is compute answers for all those queries in an optimized way so that we can use the answers of previously processed queries.

Which order are we going to follow? Let’s see…

First, we divide the given array into blocks of size (int)sqrt(N) and number of such blocks will be k=ceil(N/sqrt(N)). Each query has two integers L and R. Based on the value of L each query falls into one of k blocks. Queries will be sorted in ascending order of their block number. Since more than one query could lie in the same block, in that case they will be sorted in ascending order of R.

That was all we needed to do with blocks. Now we will start answering sorted queries from block 1 then block 2 and so on till last block.

Let’s have two pointers namely currL and currR which denote that we know the answer of the segment [currL: currR]. Initialise currL=0 and currR = -1, which indicates initially our segment is empty.

Suppose the query we want to answer is [L: R], then we will do the following operations:

While currR is less than R, add currR+1 to our existing segment, increase currR by 1 and update the answer.
While currR is greater than R, remove currR from existing segment, decrease currR by 1 and update the answer.
While currL is less than L, remove currL from our existing segment, increase currL by 1 and update the answer.
While currL is greater than L, add currL-1 to existing segment, decrease currL by 1 and update the answer.

After above operations, current segment [currL: currR] will be same as the query segment [L: R] and we will store the answer of this segment. Once we compute the answers for all the queries, we will print the stored answers in given query order.

Structure of the MO’s will be same for most of the problems. The Only thing we have to change is how are we going to modify our answer when we move from [currL: currR] to [L: R].

IMPLEMENTATION FOR THIS PROBLEM…

Try to get AC, practice this problem here.

Given problem is equivalent to find the number of elements in [L: R] with a frequency greater than equal to 1. We will have a frequency array, say freq[], of size largest element in the array (at max 1000000). freq[i] will denote the frequency of i in range [currL: currR]. At any instance, count of non-zero elements in the freq[] will represent the number of distinct elements in range [currL: currR].

Let’s see the code

TIME COMPLEXITY:

I am damn sure this must be going on in your mind… Is this algo really going to work? What is its complexity?

Complexity of the algorithm is O((N+Q)*sqrt(N)*F), where N is the size of array, Q is the total number of queries and F is complexity of add() and remove() function (see add() and remove() function in the code).

Movement of currR-

Suppose we are processing jth block, currR will move at max from 0 to N-1. So, querying one block will cost O(N*F) time by the movement of currR. There are total sqrt(N) blocks and processing them will take O(N*Sqrt(N)*F) time by movement of currR.

Movement of currL-

Suppose we are processing ith query which falls in jth block, currL can move at max from the left bound of a block to the right bound of a block, i.e size of block (sqrt(N)). So, each query will cost O(sqrt(N)*F) time by the movement of currL. For total Q queries, its movement will take O(sqrt(N)*Q*F) time.

Hence, the time complexity of algorithm will be O((N+Q)*sqrt(N)*F).

Conclusion:

It is an offline algorithm and can’t be used when either there is any update or order of answering queries matters.
It is applicable when we know the answer of range [L: R] and answers for the ranges [L: R+1], [L+1: R], [L-1: R] and [L: R-1] can be computed easily (either in constant or O(log(N)) time).
We can solve problems which are hard to solve otherwise like segment tree, Fenwick tree.

PS: I would suggest to use segment tree over MO’s if the problem can be solved both the way because the segment tree is faster and easier to implement.

Practice problems:

Problem - 86D - Codeforces

You should calculate the power of t given subarrays. First line contains two integers n and t () - the array length and…

codeforces.com.

Do practice this question, one of the finest problems I found.

Prerequisite: Tree flattening

Problem - 375D - Codeforces

You have a rooted tree consisting of n vertices. Each vertex of the tree has some color. We will assume that the tree…

codeforces.com

THE END…

Finally (sigh)!! A long one!! I hope it will help a lot. It took me a lot of time to complete it. This is my first blog, do give your views. Write comments if you find any error or like to add something. If you liked it, 👏👏 and share among your friends, it really gives me satisfaction.

Thanks!!

Happy Coding!!🙂

About me

I am Prince Kumar, 2nd year B.Tech IT student at IIIT Allahabad, member at Competitive Coding Wing, Geekhaven IIIT Allahabad.