Segment Trees
In computer science, a segment tree, also known as a statistic tree, is a tree data structure used for storing information about intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure and cannot be modified once built. A similar data structure is the interval tree.
A segment tree for a set I of n intervals uses O(n log n) storage and can be built in O(n log n) time. Segment trees support searching for all the intervals that contain a query point in time O(log n + k), k being the number of retrieved intervals or segments.
Segment trees are powerful data structures used for efficient querying of interval-based problems. They enable quick computation and updates of values associated with intervals or segments of a larger data set.
A segment tree is a binary tree where each node represents an interval, which is usually a range of indices in an array. The root node represents the entire array, while each leaf node represents a single element of the array. The tree is constructed in a bottom-up manner, dividing the array into smaller intervals until the base case is reached.
Operations on Segment Trees
The operations that the segment tree can perform must be binary and associative. Overall the values must belong to the set of the semi-group. The neutral element must be obvious according to the type of operation and semi-group we are looking for. For example, if we want to find the sum over the range of values in an array where the elements belong to R then the neutral element, in this case, will be 0. Some of the examples of operations are:
- Addition/Subtraction
- Maximum/Minimum
- GCD/LCM
- AND/OR/XOR
Structure of the Tree
A segment tree works on the principle of divide and conquer. At each level, we divide the the array segments into two parts. If the given array had [0,…,N-1) elements, then the two parts of the array will be [0,…N/2–1] and [N/2,…N-1]. We will then proceed recursively until the lower and upper bounds of the range become equal. The structure of the segment tree looks like a binary tree.
Constructing the Segment Tree
There are two important points to note when constructing a segment tree:
- Choosing what values to be stored in the nodes according the problem definition.
- What should the merge operation do.
If our problem definition (as we shall assume here) states the we need to calculate the sum over ranges, then the value at nodes should store the sum of values over the ranges. The child node values are merged back into the parent node to hold the value for that particular range (that is, the range covered by all the nodes of its sub-tree). In the end, leaf nodes store information about a single element. The leaf nodes store the array based on which the segment tree is built.
The following are the steps for constructing a segment tree:
- Start from the leaves of the tree.
- Recursively build the parents through the merge operation.
The merge operation will take constant time if the operator takes constant time. Hence building the whole tree takes O(N) time.
A={0,1,3,5,-2,3} 0-5
sum=10
/ \
0-2 3-5
sum=4 sum=6
/ \ \ \
0-1 2-2 3-4 5-5
sum=1 sum=3 sum=3 sum=3
/ \ / \
0-0 1-1 3-3 4-4
sum=0 sum=1 sum=5 sum=-2
Range Query
Consider the following example problem: given two integers L and R, return the sum of the segment [L,R].
The first step is constructing the segment tree with the addition operator and 0 as the neutral element. If the range is one of the node’s range values then simply return the answer. Otherwise, we need to traverse the left and right children of the nodes and recursively continue the process until we find a node that covers a range that totally covers a part or the whole of [L,R]. While returning from each call, we need to merge the answers received from each of its children.
As the height of the segment tree is logN, the query time will be O(logN) per query.
A={0,1,3,5,-2,3} [0-5]
sum=10
/ \
[0-2] [3-5]
sum=4 sum=6
/ \ / \
0-1 [2-2]* [3-4]* 5-5
sum=1 sum=3 sum=3 sum=3
/ \ / \
0-0 1-1 3-3 4-4
sum=0 sum=1 sum=5 sum=2
For sum from range 2 to 4, nodes that are in brackets are
traversed. The answer is taken from the nodes that are
both bracketed and starred
Point Updates
Given an index i, update a[i] with some value x.
The element’s contribution is only in the path from its leaf to its parent. Thus only logN elements will get affected due to the update. For updating, traverse until the leaf that stores the value of index i and update the value. Then while tracing back in the path, modify the ranges accordingly.
The time complexity will be O(logN).
A={0,1,3,5,-2,3} [0-5]
sum=110
/ \
[0-2] 3-5
sum=104 sum=6
/ \ \ \
[0-1] 2-2 3-4 5-5
sum=101 sum=3 sum=3 sum=3
/ \ \ \
0-0 [1-1] 3-3 4-4
sum=0 sum=101 sum=5 sum=2
If we add 100 to index 1, then the value becomes
101 and all the nodes affected are in brackets
The code for the Python implementation of segment tree range addition and point updating in recursive form can be found at my Gitthub.
A memory efficient iterative version can be found here.
Applications of Segment Trees
Applications of Segment trees:
Segment tree data structure can be used to solve various problems like:
- Range Min, Max & Sum Queries, and Range Update Queries
- segment tree with the max build, query, and update.
- segment tree with min build, query, and update.
- Computational geometry: Computational geometry is a mathematical field that includes the design, and analysis of efficient algorithms for solving geometric I/O problems. It is also used for pattern recognition and describes solid modeling algorithms.
- Geographic information systems: A Geographic Information System is a system that uses data that is attached to a unique location, and analyzes and generates geographically referenced information.
- Storing segments in an arbitrary manner.
- Used in competitive programming.
- Segment trees can be used to count the frequency of elements in a given range.
- Segment trees can be used for image processing tasks.
Conclusion
Apart from the standard segment tree, which is used for range queries and updates, there are other variants with specific functionalities. Here are some notable ones:
- Lazy Propagation Segment Tree: This variant optimizes segment tree updates by deferring them until necessary. It uses lazy propagation to avoid redundant computations. For example, when updating a range, instead of updating all the affected nodes, the changes are lazily propagated as needed.
- Persistent Segment Tree: A persistent segment tree allows for efficient versioning, meaning it retains multiple versions of the tree to support historical queries. This is advantageous when we need to track changes over time or undo modifications made to the tree.
- Interval Segment Tree: Unlike the traditional segment tree, where each node represents a continuous interval, the interval segment tree represents disjoint intervals. It is useful for problems that involve non-overlapping ranges, such as scheduling tasks or managing separate time intervals.
Overall, segment trees are valuable tools for solving range-based problems efficiently. With their ability to handle range queries and updates effectively, they have become an essential component of various algorithms in computer science. By understanding the different types and variations of segment trees, developers can apply them to different problems and enhance the performance of their solutions.
If you have questions, criticism or additional information, please leave a comment.