Rope (data structure)

josephg · on June 8, 2013

I made a skip list based rope implementation in C and Javascript. In C, the crossover point when ropes become faster is around 200 bytes. They're an awesome little data structure.

C: https://github.com/josephg/librope

JS: https://github.com/josephg/jumprope

kyzyl · on June 8, 2013

In the usage example for the C lib, you have

'uint8_t *str = rope_createcstr(rope, NULL);'

Did you mean 'r' instead of 'rope'?

josephg · on June 9, 2013

Yep - fixed! (Thanks @rymo4 for the pull request)

kyzyl · on June 9, 2013

Heh, no problem. Not sure why I decided to post here instead of just doing it there.

Nice lib, though. I'll keep it in mind.

rjzzleep · on June 9, 2013

i started implementing a rope implementation for dart about a year ago

https://github.com/fishman/dart-mutablestring

some things working some things missing, but it's been a while so i don't really remember

https://github.com/fishman/dart-ropebench

pcwalton · on June 8, 2013

In SpiderMonkey, JavaScript strings turn into ropes based on some heuristics (if you append to them a lot, I think). This helps a few benchmarks.

So while you might think a JavaScript string is just a pointer + length, the implementation is actually significantly more complex: the engine will pick different implementations depending on how it thinks you're using it.

mccr8 · on June 8, 2013

The whole hierarchy of SpiderMonkey strings is laid out in this comment in the code:

http://mxr.mozilla.org/mozilla-central/source/js/src/vm/Stri...

Chris_Newton · on June 8, 2013

If you like the idea of ropes, you might also be interested in a somewhat related paper by Charles Crowley called Data Structures for Text Sequences:

http://www.cs.unm.edu/~crowley/papers/sds.pdf

The material about the “piece table” method is particularly interesting IMHO.

rav · on June 8, 2013

A while ago, I implemented ropes based on the 1995 paper by Boehm et al in JavaScript. The interface provides methods charAt, charCodeAt, concat and substring, so it functions as a drop-in replacement for JavaScript strings (provided you limit your use to the given supported methods).

https://github.com/Mortal/ropejs/blob/master/rope.js

Roboprog · on June 8, 2013

Interesting, but I'm going to have to have a really huge job before I want to implement something like this rather than something like an array-of-arrays/list-of-lists or something like a DOM tree. That is, find a way to naively partition the text into intuitive chunks (newlines, spaces, tags, whatever as chunk boundaries).

Yes, I'm deliberately being contrarian, as the data structure takes a bit of work to comprehend, and even more work to understand the merge and split operations when you make updates. Not saying that I would never use it, just that the need better justify the effort. (not against going outside the out-of-the-box libraries to do things like tries or radix sort, for example, just want a justification to do the work)

twotwotwo · on June 9, 2013

Not all that contrarian. Some editors use a simpler structure, a gap buffer, that doesn't have the same speed in all situations but still handles inserts and deletes well when they tend to fall in a small area:

http://scienceblogs.com/goodmath/2009/02/18/gap-buffers-or-w...;

bzbarsky · on June 9, 2013

A DOM tree is rather a pain to implement in practice. You need fast foo.childNodes[i] and also fast foo.nextChild. And fast inserts and removes. And APIs that use (parent, offset) pairs as their basic concept (like ranges) and other APIs that use (parent, prevSibling) pairs (like node insertion). Oh, and memory use needs to be minimal. And the code needs to have low constants, not just low algorithmic complexity.

Roboprog · on June 9, 2013

Good point! Sloth and inertia usually allows me to use the work of others in this case (XML DOM, when appropriate), though, but that doesn't make it internally simply (or efficient).

snprbob86 · on June 8, 2013

I'd like to see a rope-variant of the 2-3 finger tree.

2-3 finger trees are immutable/persistent and support access to both ends in amortized constant time and logarithmic concatenation and splitting.

The complicating factor being that for flyweight values, like characters, the interior nodes of the tree would be prohibitive for one leaf per character. Surely there must be a variant of 2-3 finger trees that addresses this.

jeffffff · on June 8, 2013

you can implement a rope as a 2-3 finger tree of string literals. i wrote a java implementation of 2-3 finger trees a few months ago and implemented both ropes of bytes and ropes of chars (rope backed by finger tree of java Strings here: https://github.com/jeffplaisance/fingertree/blob/master/src/...).

danieldk · on June 9, 2013

This might be of interest:

http://hackage.haskell.org/package/rope

philsnow · on June 8, 2013

I guess if you have an array of characters and you want to insert into the middle, you could promote the array into a rope as follows:

  1. break the array into two extents (start address / length pairs)
  2. make a rope out of those two extents
  3. do the insert on the resulting rope

this assumes that you're okay with starting with an array of characters and ending up with a rope.

ISTR this pattern was common-ish in erlang last time I looked at erlang, but they call ropes (of bytes) "iolists", and support for iolists goes all the way down into the standard library.

Cthulhu_ · on June 9, 2013

> this assumes that you're okay with starting with an array of characters and ending up with a rope.

Well, in any OO language, I guess both a string and a rope would still be considered a String object, whether it's a character array or a tree-like structure backing it wouldn't really matter to the public interface.

shawn-butler · on June 8, 2013

I became aware of them as SGI proposed extension to C++ standard library.

Good evaluation of them remains

http://www.sgi.com/tech/stl/Rope.html

nly · on June 8, 2013

The GNU C++ standard library still includes them as a non-standard extension.

http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a0006...

Tycho · on June 8, 2013

Can someone explain intuitively what this is?

inportb · on June 8, 2013

A rope stores a long string as a collection (binary tree) of shorter strings. Strings are stored contiguously in memory, so mutation (insertion, deletion, catenation, etc) often involves much copying of data, which could be problematic with long strings. Ropes avoid this problem, but have some additional code complexity and resource overhead associated with data fragmentation.

samatman · on June 8, 2013

It is a tree, where the leaves are substrings, and the nodes are the sum of the lengths of all the leaves under the node.

Thus, the root node has a value equal to the length of the string; it has two child nodes, with values equal to the lengths of the first and second 'half' of the string, and so on.

zippie · on June 8, 2013

C implementation: https://github.com/kshulgin/crope

joe_the_user · on June 8, 2013

As far as I can tell, all the properties described come because this is a balanced binary tree of strings "with rank".

I'd be curious if there is anything else that makes this different?

(it's fairly easy to adjust any balanced binary tree to allow the "report" function, ie generating a ordered sublist of size me in O(log(n) + m) time )

manish_gill · on June 8, 2013

Hmm. Is this the data structure that's used in the "rope" libraries for Python symbol lookup etc?

pjscott · on June 8, 2013

Unless I mistake what you're talking about, that's something very different:

http://rope.sourceforge.net

moondowner · on June 8, 2013

One more resource on Ropes, more Java oriented http://www.ibm.com/developerworks/library/j-ropes/

EdiX · on June 9, 2013

PSA: a Gap Buffer is a far simpler data structure than a rope that works just as well in most (but not all) circumstances.

itcmcgrath · on June 8, 2013

Our database uses Ropes internally as it deals with large strings and frequently inserts and removes sections.