I made a skip list based rope implementation in C and Javascript. In C, the crossover point when ropes become faster is around 200 bytes. They're an awesome little data structure.
In SpiderMonkey, JavaScript strings turn into ropes based on some heuristics (if you append to them a lot, I think). This helps a few benchmarks.
So while you might think a JavaScript string is just a pointer + length, the implementation is actually significantly more complex: the engine will pick different implementations depending on how it thinks you're using it.
A while ago, I implemented ropes based on the 1995 paper by Boehm et al in JavaScript. The interface provides methods charAt, charCodeAt, concat and substring, so it functions as a drop-in replacement for JavaScript strings (provided you limit your use to the given supported methods).
Interesting, but I'm going to have to have a really huge job before I want to implement something like this rather than something like an array-of-arrays/list-of-lists or something like a DOM tree. That is, find a way to naively partition the text into intuitive chunks (newlines, spaces, tags, whatever as chunk boundaries).
Yes, I'm deliberately being contrarian, as the data structure takes a bit of work to comprehend, and even more work to understand the merge and split operations when you make updates. Not saying that I would never use it, just that the need better justify the effort. (not against going outside the out-of-the-box libraries to do things like tries or radix sort, for example, just want a justification to do the work)
Not all that contrarian. Some editors use a simpler structure, a gap buffer, that doesn't have the same speed in all situations but still handles inserts and deletes well when they tend to fall in a small area:
A DOM tree is rather a pain to implement in practice. You need fast foo.childNodes[i] and also fast foo.nextChild. And fast inserts and removes. And APIs that use (parent, offset) pairs as their basic concept (like ranges) and other APIs that use (parent, prevSibling) pairs (like node insertion). Oh, and memory use needs to be minimal. And the code needs to have low constants, not just low algorithmic complexity.
Good point! Sloth and inertia usually allows me to use the work of others in this case (XML DOM, when appropriate), though, but that doesn't make it internally simply (or efficient).
I'd like to see a rope-variant of the 2-3 finger tree.
2-3 finger trees are immutable/persistent and support access to both ends in amortized constant time and logarithmic concatenation and splitting.
The complicating factor being that for flyweight values, like characters, the interior nodes of the tree would be prohibitive for one leaf per character. Surely there must be a variant of 2-3 finger trees that addresses this.
you can implement a rope as a 2-3 finger tree of string literals. i wrote a java implementation of 2-3 finger trees a few months ago and implemented both ropes of bytes and ropes of chars (rope backed by finger tree of java Strings here: https://github.com/jeffplaisance/fingertree/blob/master/src/...).
I guess if you have an array of characters and you want to insert into the middle, you could promote the array into a rope as follows:
1. break the array into two extents (start address / length pairs)
2. make a rope out of those two extents
3. do the insert on the resulting rope
this assumes that you're okay with starting with an array of characters and ending up with a rope.
ISTR this pattern was common-ish in erlang last time I looked at erlang, but they call ropes (of bytes) "iolists", and support for iolists goes all the way down into the standard library.
> this assumes that you're okay with starting with an array of characters and ending up with a rope.
Well, in any OO language, I guess both a string and a rope would still be considered a String object, whether it's a character array or a tree-like structure backing it wouldn't really matter to the public interface.
A rope stores a long string as a collection (binary tree) of shorter strings. Strings are stored contiguously in memory, so mutation (insertion, deletion, catenation, etc) often involves much copying of data, which could be problematic with long strings. Ropes avoid this problem, but have some additional code complexity and resource overhead associated with data fragmentation.
It is a tree, where the leaves are substrings, and the nodes are the sum of the lengths of all the leaves under the node.
Thus, the root node has a value equal to the length of the string; it has two child nodes, with values equal to the lengths of the first and second 'half' of the string, and so on.
As far as I can tell, all the properties described come because this is a balanced binary tree of strings "with rank".
I'd be curious if there is anything else that makes this different?
(it's fairly easy to adjust any balanced binary tree to allow the "report" function, ie generating a ordered sublist of size me in O(log(n) + m) time )
C: https://github.com/josephg/librope
JS: https://github.com/josephg/jumprope