SEGMENT TREE — PART 1 (INTRODUCTION)
Formal Definition : Segment-Tree is a data structure which can store information about any segment (sub-array) [i..j] of an array Arr[start…end], and lets us extract this information and update it in logarithmic time.
Intimidated by the formal definition ?? :) (If the answer is yes -> don’t worry. That only means that your are just like me.)
Let’s start our journey in the world of Segment Tree with a story:
There are two friends : Alice and Bob.
Alice and Bob are in the same class. It’s free-time and they decide to play a game. Alice comes up with a unique game. The conditions of the game are like this:
- Alice will shout a stream of numbers -> [n1,n2,n3,n4,n5,n6,n7,n8,….] where ni is a number between 1 to 100
- Bob has to note those numbers and once Alice is done, game starts.
- Alice will ask queries to Bob and he has to answer the queries as quickly as possible.
The queries will be in this format :
q(1,5) -> which means Bob has to find out n1+n2+n3+n4+n5 and tell the answer to Alice.
q(7,13) means n7+n8+n9+n10+n11+n12+n13
q(5,8) means n5+n6+n7+n8
4. Alice will ask 3 queries to Bob, she will note down the total time that Bob takes to answer all the 3 queries.
5. After 3 queries, Bob will ask 3 queries to Alice and will note-down the total time that Alice takes to answer all the queries.
6. Whoever takes lesser time, wins the game.
NOTE : Since Alice and Bob are in the same class, both takes one unit time to perform any addition or subtraction operation.
FIRST ROUND
Alice speaks a stream of 9 numbers : 3,5,4,7,2,8,1,9,6 and numbers are like : n0 = 3, n1 = 5, n2 = 4 and so on.
Bob notes it down, and here comes the first query:
Query 1 : q(3,5)
Bob performs addition as follows :
n3 +n4 +n5 = 7 + 2 + 8 = 17
Bob answers and total time taken is 2 units (1 for 7 + 2 = 9 and 1 for 9 + 8 = 17)
Total time taken by Bob so far = 2 units
Query 2 : q(4,7)
Bob performs addition as follows :
n4 +n5 + n6 + n7 = 2 + 8 + 1 + 9 = 20
Bob answers and total time taken is 3 units (1 for 2 + 8 = 10, 1 for 10 + 1= 11 and 1 for 11+9 = 20)
Total time taken by Bob so far = 2 +3 = 5 units
Query 3: q(1,8)
Bob performs addition as follows :
n1 +n2 + n3 + n4 + n5 +n6 + n7 + n8 = 5+4+7+2+8+1+9+6 = 42
Bob answers and total time taken is 7 units (as he has to perform 7 additions)
Total time taken by Bob so far = 2 +3 + 7 = 12 units
Hence round 1 is over and total Time taken by Bob is 12 units.
SECOND ROUND
Bob is really desperate to win and he wants to make it as tough as possible for Alice to answer his queries.
He speaks a stream of 15 numbers : 1,5,9,6,7,4,3,6,2,1,8,6,5,4,7
Alice knows that Bob is trying to simply win by asking queries over a very large range. Again she is smart and comes up with a solution.
Even before Bob could ask his first query, she draws a strange looking picture like this:
This is really smart of Alice to draw a picture like this.
At first, this picture looks really strange, but if we look closely with focus we can find out what Alice has done.
We can decode Alice’s picture as follows :
Now, Bob starts asking questions,
Query 1 : q(3,12)
Alice looks at her picture and picks up the following nodes :
When we look closely at the picture, we can find out what a,b,c and d represents:
a = n3
b = n4 +n5 + n5 +n6+n7
c= n8 + n9 + n10 + n11
d= n12
Hence , a + b + c +d = n3 + n4 +n5 + n5 +n6+n7 + n8 + n9 + n10 + n11 + n12
Using this approach, Alice performs addition as follows :
a + b + c + d ( Remember how Bob performed addition -> n1 + n2 +.. )
a + b+ c+ d = 6 + 20+ 17 +5 = 48 which is essentially equal to sum(n3,n12)
Hence , Alice answers first query in 3 units of time.
Total time taken by Alice = 3 units
Query 2: q(0,14)
Alice looks at her picture and picks up the following nodes :
We can see in the above picture that the above node represents the sum of all the nodes n0..n15
Hence Alice answers the query in 1 unit time.
Total time taken by Alice = 3 + 1 = 4 units
Query 3: q(4,11)
Alice looks at her picture and picks up the following nodes :
We know that :
a = n4 +n5+ n6 +n7
b = n8 + n9 + n10+ n11
Hence, a+b will be the answer and Alice answers correctly = a+ b = 20 + 17 = 37 in just 1units of time.
Total time taken by Alice = 4 + 1= 5 units
RESULTS
Total time taken by Bob = 12 Units
Total time taken by Alice = 5 Units
Let’s analyze the total span of queries for Both Bob and Alice:
Bob’s Case :
q1(3,5) -> 3 elements
q2(4,7) -> 4 elements
q3(1,8) -> 8 elements
Total Span = 3 + 4 + 8 = 15 elements
Total Time taken to answer = 12 units
Alice’s Case :
q1(3,12) -> 10elements
q2(0,14) -> 15 elements
q3(4,11) -> 8 elements
Total Span = 10 + 15 + 8 = 33 elements
Total time taken to answer = 5 units
Alice’s span is more than double than Bob’s span but time taken by Alice is almost one-third as compared to Bob.
This is terrific performance improvement.
Just imagine what could be the performance difference if we are allowed to ask queries in range [0,1000000000] :)
CONCLUSION
The tree represented in Fig 2 is actually a Segment Tree. This is a very important data structure when it comes to range-based queries.
You have already seen it’s importance in improving the performance in case of a range-based query system.
This is just a introductory article on Segment Trees and we will continue in our journey of learning the architecture and implementation of Segment Tree in the next parts of this Segment Tree series.