Rope Data Structure

Umang Agarwal
Underrated Data Structures and Algorithms
8 min readApr 17, 2020

Bad programmers worry about the code.

Good ones worry about data structure and their relationships.

-Linus Torvalds

One of the most common operations on strings is appending or concatenation. Appending to the end of a string when the string is stored in the traditional manner (i.e. an array of characters) would take a minimum of O(n) time (where n is the length of the original string).

We can reduce time taken by appending using Ropes Data Structure.

What are Ropes afterall?

A Rope data structure is a tree data structure which is used to store or manipulate large strings in a more efficient manner. It allows for operations like insertion, deletion, search and random access to be executed faster and much more efficiently in comparison to a traditional String.

This data structure is widely used by softwares such as text editors like Sublime, email systems like Gmail and text buffers to handle large strings efficiently.

Description:

A rope is a binary tree (that is, each node can have maximum of 2 children) where each leaf (end node) holds a string and a length (also known as a “weight”), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree. A node with two children thus divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part of the string, and node’s weight is the sum of the weights of the leaf nodes of it’s left child.

Since this is a derivation of the binary tree, it’s starting index is marked as Root and if we consider the height of the rope to be h, where root is at h=0, the maximum possible modes in a rope can be 2^(h + 1) — 1.

For rope operations, the strings stored in nodes are assumed to be constant immutable objects in the typical nondestructive case, allowing for some copy-on-write behavior. Leaf nodes are usually implemented as basic fixed-length strings with a reference count attached for de-allocation when no longer needed, although other garbage collection methods can be used as well.

Comparison:

Before diving into the operations, we should first know it’s merits and limitations over monolithic string arrays so that we know when to use it and when not to.

Merits

  • Ropes enable much faster insertion and deletion of text in comparison to string arrays, on which the operations have a time complexity of O(n).
  • Ropes do not require O(n) extra memory when being operated upon, unlike arrays which require it for copying operations.
  • Ropes do not need large contiguous memory areas like arrays.
  • If the operations performed are non-destructive(i.e the altered content is preserved) it behaves as a persistent data structure. This helps the text editors support multiple undo levels.
  • Stable performance regardless of size.

Limitations

  • Greater overall space use when not being operated on, mainly to store parent nodes. There is a trade-off between how much of the total memory is such overhead and how long pieces of data are being processed as strings. The strings in example figures above are unrealistically short for modern architectures. The overhead is always O(n), but the constant can be made arbitrarily small.
  • Increase in time to manage the extra storage, resulting in slow handling of smaller strings
  • Increased complexity of source code, implying greater risk of bugs

This table compares the algorithmic traits of string and rope implementations, not their raw speed. Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for longer strings, time complexity and memory use for inserting and deleting characters becomes unacceptably large. In contrast, a rope data structure has stable performance regardless of data size. Further, the space complexity for ropes and arrays are both O(n). From this, we can derive that ropes are more preferable when the size of our data is large and it is to be modified on a regular basis. String arrays perform better on smaller data & are better when there are less operations.

Operations:

Now that we know the merits and limitations of ropes, and we can perceive when to use them, let’s dive into the operations.

1. Searching(indexing):

In order to find the character at ith position, we search recursively beginning at the root node. The easiest way to put this is, we follow one golden rule — if the weight of the current node is lower than the value of i, we subtract the weight from i & move right. If the weight is less than the value of i we simply move left. We continue till the point we reach a leaf node. At the leaf node, we simply return the character at the ith(updated) position

Time Complexity: O(log n)

Example: Let’s look for the 10th character in the rope shown above.

This is how we go about the process:-

Pseudocode:

function index(node, i)

if node.weight<i and exists(node.right) then

return index(node.right, i)

end if

if exists(node.left) then

return index(node.left, i)

end if

return node.value[i]

end

2. Concatenation(joining S1 and S2):

A concatenation operation between 2 strings(S1 & S2) is performed by creation of a new root node which has a weight equal to the sum of weights of leaf nodes in S1. This new node contains S1 as it’s left child and S2 as it’s right child, or vice versa as per the demand of the situation. This takes time if the tree is already balanced. Since, most of the rope operations need a balanced tree, it might require rebalancing after the operation.

Time Complexity: O(1) (or O(log n) time to compute weight of root node if the tree is already balanced)

Example: Let’s try concatenating 2 ropes.

This is how we go about the process:-

Pseudocode:

function concatenation(node1, node2)

new_node.left = node1

new_node.right = node2

new_node.value = weight(node1)

end

//function to find weight of root:-

function weight(node)

if node.left == null and node.right == null then

return node.weight

end if

x=0

if node.left != null then

x=x+weight(node.left)

end if

if node.right != null then

x=x+weight(node.right)

end if

return x

end

3. Splitting:

In order to split a string at any given point i, there are 2 major cases we might encounter:

  1. Split point being the last character of a leaf node.
  2. Split point being a middle character of a leaf node.

With the second case, we can reduce it to the much simpler first one by splitting the string at the given point into 2 leaf nodes & creating a new parent for these component strings.

Now, for the first case, we first locate the node after which we have to split the rope. After we have found it, we break the link of that node to the rope and all links towards it’s right. We then concatenate the separated nodes and re-balance both the new ropes.

Time Complexity: O(log n)

Example: Let’s split the following rope into 2 ropes- “hello_i_am_a” and “_rope_data_structure”

This is how we go about the process:-

4. Insertion:

In order to insert a string S’ between our string, at position i, we simply need to split it at the index from which to insert, and then first concatenate the first half of the string we split and the new string followed by concatenating the result and the other split off part of the original string. The cost of this operation is the sum of the three operations performed.

Time Complexity: O(log n)

Algorithm:

  1. Split original string into s1 and s2 at i
  2. Concatenate s1 and string to be inserted(ns), call it s3
  3. Concatenate s3 and s2

5. Deletion:

In order to delete a part of string from the middle (say from ith to (i+j)th character), split the string at i. Then, split the second string obtained at j. Then, concatenate the first string obtained on splitting for the first time and the second string obtained on splitting the second time.

Time Complexity: O(log n)

Algorithm:

  1. Split original string into s1 and s2 at i
  2. Split s2 into s3 and s4 at j
  3. Concatenate s1 and s4

6. Reporting:

In order to report the string between given points, find the node that contains the starting character and the first node whose weight is greater than the ending index j. Then, traverse the tree between these nodes and output all the characters by performing an in-order traversal operation.

Time Complexity: O(j+log n)

Algorithm:

  1. Find the node containing the starting character
  2. Find the first node with weight greater than ending index
  3. Traverse the tree between these nodes in an in-order fashion, keep storing the values of the leaf nodes in a string s
  4. Return the string s

Okay, this is cool, but when and where can I use this?

As mentioned earlier, this data structure works best for the modules or functions involving large strings which are manipulated often. This is because, as we have seen up till now, this structure reduces the time taken for string manipulations drastically.

Now let’s look at an example to see why Ropes are a good substitute to monolithic string arrays:

Given 2 strings a and b, concatenate them into a third string c.

Method 1 (Native method):

We create a string c to store concatenated string. We first traverse a and copy all characters of a to c. Then we copy all characters of b to c.

Time complexity : O(n)

Now let’s try to solve the same problem using Ropes.

Method 2 (Rope structure method):

This rope structure can be utilized to concatenate two strings in constant time.

  1. Create a new root node (that stores the root of the new concatenated string).
  2. Mark the left child of this node, the root of the string that appears first.
  3. Mark the right child of this node, the root of the string that appears second.

And that’s it. Since this method only requires to make a new node, it’s time complexity is O(1).

Are there some applications that are already using this?

The major areas where the Rope data structure is used are:

  • Text Editors which handle large amounts of strings.
  • E-mail messages which might require a lot of editing.
  • Edit buffers for handling large text.
  • Cedar programming environment has used ropes since its inception.

Thank you for reading this. I hope this makes life and coding easier for you.
Stay tuned for more.

Contact me at: LinkedIn
E-mail ID: agarwalumang012@gmail.com

--

--