Rope Data Structure

Umang Agarwal
Apr 17, 2020 · 8 min read

Bad programmers worry about the code.

Good ones worry about data structure and their relationships.

-Linus Torvalds

One of the most common operations on strings is appending or concatenation. Appending to the end of a string when the string is stored in the traditional manner (i.e. an array of characters) would take a minimum of O(n) time (where n is the length of the original string).

We can reduce time taken by appending using Ropes Data Structure.

What are Ropes afterall?

A Rope data structure is a tree data structure which is used to store or manipulate large strings in a more efficient manner. It allows for operations like insertion, deletion, search and random access to be executed faster and much more efficiently in comparison to a traditional String.

This data structure is widely used by softwares such as text editors like Sublime, email systems like Gmail and text buffers to handle large strings efficiently.

Description:

A rope is a binary tree (that is, each node can have maximum of 2 children) where each leaf (end node) holds a string and a length (also known as a “weight”), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree. A node with two children thus divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part of the string, and node’s weight is the sum of the weights of the leaf nodes of it’s left child.

Since this is a derivation of the binary tree, it’s starting index is marked as Root and if we consider the height of the rope to be h, where root is at h=0, the maximum possible modes in a rope can be 2^(h + 1) — 1.

For rope operations, the strings stored in nodes are assumed to be constant immutable objects in the typical nondestructive case, allowing for some copy-on-write behavior. Leaf nodes are usually implemented as basic fixed-length strings with a reference count attached for de-allocation when no longer needed, although other garbage collection methods can be used as well.

Image for post
Image for post

Comparison:

Before diving into the operations, we should first know it’s merits and limitations over monolithic string arrays so that we know when to use it and when not to.

Merits

  • Ropes enable much faster insertion and deletion of text in comparison to string arrays, on which the operations have a time complexity of O(n).

Limitations

  • Greater overall space use when not being operated on, mainly to store parent nodes. There is a trade-off between how much of the total memory is such overhead and how long pieces of data are being processed as strings. The strings in example figures above are unrealistically short for modern architectures. The overhead is always O(n), but the constant can be made arbitrarily small.
Image for post
Image for post

This table compares the algorithmic traits of string and rope implementations, not their raw speed. Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for longer strings, time complexity and memory use for inserting and deleting characters becomes unacceptably large. In contrast, a rope data structure has stable performance regardless of data size. Further, the space complexity for ropes and arrays are both O(n). From this, we can derive that ropes are more preferable when the size of our data is large and it is to be modified on a regular basis. String arrays perform better on smaller data & are better when there are less operations.

Operations:

Now that we know the merits and limitations of ropes, and we can perceive when to use them, let’s dive into the operations.

1. Searching(indexing):

In order to find the character at ith position, we search recursively beginning at the root node. The easiest way to put this is, we follow one golden rule — if the weight of the current node is lower than the value of i, we subtract the weight from i & move right. If the weight is less than the value of i we simply move left. We continue till the point we reach a leaf node. At the leaf node, we simply return the character at the ith(updated) position

Time Complexity: O(log n)

Example: Let’s look for the 10th character in the rope shown above.

This is how we go about the process:-

Image for post
Image for post

Pseudocode:

function index(node, i)

if node.weight<i and exists(node.right) then

return index(node.right, i)

end if

if exists(node.left) then

return index(node.left, i)

end if

return node.value[i]

end

2. Concatenation(joining S1 and S2):

A concatenation operation between 2 strings(S1 & S2) is performed by creation of a new root node which has a weight equal to the sum of weights of leaf nodes in S1. This new node contains S1 as it’s left child and S2 as it’s right child, or vice versa as per the demand of the situation. This takes time if the tree is already balanced. Since, most of the rope operations need a balanced tree, it might require rebalancing after the operation.

Time Complexity: O(1) (or O(log n) time to compute weight of root node if the tree is already balanced)

Example: Let’s try concatenating 2 ropes.

This is how we go about the process:-

Image for post
Image for post

Pseudocode:

function concatenation(node1, node2)

new_node.left = node1

new_node.right = node2

new_node.value = weight(node1)

end

//function to find weight of root:-

function weight(node)

if node.left == null and node.right == null then

return node.weight

end if

x=0

if node.left != null then

x=x+weight(node.left)

end if

if node.right != null then

x=x+weight(node.right)

end if

return x

end

3. Splitting:

In order to split a string at any given point i, there are 2 major cases we might encounter:

  1. Split point being the last character of a leaf node.

With the second case, we can reduce it to the much simpler first one by splitting the string at the given point into 2 leaf nodes & creating a new parent for these component strings.

Now, for the first case, we first locate the node after which we have to split the rope. After we have found it, we break the link of that node to the rope and all links towards it’s right. We then concatenate the separated nodes and re-balance both the new ropes.

Time Complexity: O(log n)

Example: Let’s split the following rope into 2 ropes- “hello_i_am_a” and “_rope_data_structure”

This is how we go about the process:-

Image for post
Image for post
Image for post
Image for post
Image for post

4. Insertion:

In order to insert a string S’ between our string, at position i, we simply need to split it at the index from which to insert, and then first concatenate the first half of the string we split and the new string followed by concatenating the result and the other split off part of the original string. The cost of this operation is the sum of the three operations performed.

Time Complexity: O(log n)

Algorithm:

  1. Split original string into s1 and s2 at i

5. Deletion:

In order to delete a part of string from the middle (say from ith to (i+j)th character), split the string at i. Then, split the second string obtained at j. Then, concatenate the first string obtained on splitting for the first time and the second string obtained on splitting the second time.

Time Complexity: O(log n)

Algorithm:

  1. Split original string into s1 and s2 at i

6. Reporting:

In order to report the string between given points, find the node that contains the starting character and the first node whose weight is greater than the ending index j. Then, traverse the tree between these nodes and output all the characters by performing an in-order traversal operation.

Time Complexity: O(j+log n)

Algorithm:

  1. Find the node containing the starting character

Okay, this is cool, but when and where can I use this?

As mentioned earlier, this data structure works best for the modules or functions involving large strings which are manipulated often. This is because, as we have seen up till now, this structure reduces the time taken for string manipulations drastically.

Now let’s look at an example to see why Ropes are a good substitute to monolithic string arrays:

Given 2 strings a and b, concatenate them into a third string c.

Method 1 (Native method):

We create a string c to store concatenated string. We first traverse a and copy all characters of a to c. Then we copy all characters of b to c.

Time complexity : O(n)

Now let’s try to solve the same problem using Ropes.

Method 2 (Rope structure method):

This rope structure can be utilized to concatenate two strings in constant time.

  1. Create a new root node (that stores the root of the new concatenated string).

And that’s it. Since this method only requires to make a new node, it’s time complexity is O(1).

Are there some applications that are already using this?

The major areas where the Rope data structure is used are:

  • Text Editors which handle large amounts of strings.

Thank you for reading this. I hope this makes life and coding easier for you.
Stay tuned for more.

Contact me at: LinkedIn
E-mail ID: agarwalumang012@gmail.com

Underrated Data Structures and Algorithms

Home of useful but underrated Data Structures and Algorithms

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store