Hacker News new | past | comments | ask | show | jobs | submit login

There's a no-free-lunch theorem here that's being ignored.

If you have a space of 64 bits to represent elements of some ordered field, and you insert items one at a time, there is some ordering of the items that allows you to make only 64 inserts.

Claiming that numbering the items with integers only allows you 16 insertions, but "It would take virtually forever to run out of space through repeated list reordering" using rational numbers just can't be correct. The author cherry-picked a bad implementation for the integers and a favourable insertion order for his pet approach.

Where exactly did he go wrong? By picking the first number in the sequence of integers to be 65536, one of the smallest of the unsigned 64-bit integers, and then deciding all the inserts would happen in descending order. If he had picked a number like 2^63 instead, he would have got at least 63 inserts no matter what the insertion order. Using this integer strategy also lends itself to easy reindexing if you do need to: the row of rank N gets reindexed to 2^64 / N. Or if you want to increase the size of the index field to 128 bits, bitshift everything left.

Note, just because we have this NFL theorem doesn't mean one approach can't outperform another in typical use patterns, but I don't see evidence for that here. The most logical integer-index strategy isn't considered and the insertion orders that exhaust the rational strategy are not particularly pathological.




I don’t understand what the statement of this theorem is. I can’t come up with a simple argument in my head for why there can never be enough information, but I’m not really sure how to think about it and I didn’t try very hard.


Think about bisection of ordering state space for every insert.

If 64 bits represent an ordering, and you split it in two for each insert - necessarily, to permit future insertion on either side of the new insert - then at least one side has no more than 63 bits left to store its ordering information for all future inserts on its side. You can't avoid this even with tricks that represent fractional bits.

The problem is that in order to avoid renumbering (which redistributes ordering state), the bisection needs to pre-allocate state space for future ordering information. The best it can do (without having more information about future insert order) is to split the state space perfectly evenly.

In the article, there's a big blue Update box which mentions a sequence of L/R path flips which make the fraction into a Fibonacci sequence, where you can go to 46 in sequence before exceeding a 32-bit integer. You'll observe than 46 is less than 64.


Yes, exactly this. Though it might be cleaner if instead of "no more than 63 bits" we say "no more than 2^63 spaces" on one side of the insert.

The author thinks it's very elegant that the rational numbers can be represented as a binary tree, and proposes some tricks to make the binary tree almost balanced and to make traversing it easy. You know what set is by default represented as a perfectly balanced binary tree where each bit in the representation tells you which way to go in the tree? The 64-bit integers.


I wouldn’t have picked 65,536, too, but there’s a trade-off between the increment and the number of items you can append without running into problems.

If he had picked 2^63, it wouldn’t be possible to create an initial list of even 2 items.

Depending on the use case, I probably would have picked 2^54 or so (allows for appending about a thousand items)

If your lists are small it doesn’t matter much what you do, but for large lists, I don’t think there’s a clean solution for this, if one keeps the key size constant.


If you have a known large number N of items to insert up front, and future inserts will be random, you can do the reindexing to 2^64 / N in advance. (Perhaps there's a further optimization in rounding N up to the next power of 2).

N = 2^48 is what gives you his strategy of numbering the initial entries in increments of 65536.


2^64/(N+1), you mean ;-)

But yes, the more you know, the better your invitation guess can be. That’s why I wrote “depending on the use case”.


Pretty much. Also, how long do you expect the list to be? 10 items? 100 items? If the user is ordering them manually, then if it's 100 items there are other issues to consider rather than just the order

Keeping a space between the items and reordering/"defraging" periodically might be easier.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: