Hacker News new | past | comments | ask | show | jobs | submit login

Ha, I had this exact problem asked as an interview question.

IIRC their real life solution was to leave gaps between elements (eg index 0, 100, 200… instead of 0, 1, 2) and re index when needed. Probably works well enough, what I came up with is (as you say) the idea of fractional indexing, but dealing with decimals is a pain so you can instead represent them as vectors, which you can just represent as a string of numbers that you sort lexicographically.

So an element inserted between elements 1 and 2 gets an index 11 (anything between 11-19 is fine). Between 1 and 11 is 101. Between 11 and 2 is 12 (or anything between 12-19). Note that these indexes are not numbers, they’re string that are compared lexicographically.

I’m sure there’s downsides, eg it takes a lot more memory to sort these indexes (strings are much larger than numbers). It feels a bit too smart to not have unexpected downsides.






> IIRC their real life solution was to leave gaps between elements (eg index 0, 100, 200… instead of 0, 1, 2) and re index when needed. Probably works well enough, what I came up with is (as you say) the idea of fractional indexing

Those are the same thing. Leaving gaps is fractional indexing. It's just fixed-point rather than floating point.

> an element inserted between elements 1 and 2 gets an index 11 (anything between 11-19 is fine). Between 1 and 11 is 101. Between 11 and 2 is 12 (or anything between 12-19). Note that these indexes are not numbers, they’re string that are compared lexicographically.

This reminds me of the most interesting method of generating random integers in an arbitrary range from random bits: interpret the bitstream as a string representing a real number (in binary) between 0 and 1. If, for example, you want to use bits to generate a number between 0 and 5, you divide the unit interval into sixths, and examine bits until you've determined conclusively that your number lies within one of those intervals:

     +---- 0 = 0.000000000... ---+
     |         0.000 -> 0        |
     |         0.00100 -> 0      |
     +-- 1/6 = 0.001010101...  --+
     |         0.0011 -> 1       |
     |         0.0100 -> 1       |
     +-- 2/6 = 0.010101010...  --+
     |         0.011 -> 2        |
     +-- 3/6 = 0.100000000...  --+
     |         0.100 -> 3        |
     +-- 4/6 = 0.101010101...  --+
     |         0.1011 -> 4       |
     |         0.1100 -> 4       |
     +-- 5/6 = 0.110101010...  --+
     |         0.11011 -> 5      |
     |         0.111 -> 5        |
     +---------------------------+

> solution was to leave gaps between elements (eg index 0, 100, 200… instead of 0, 1, 2)

Reminds me of my days writing BASIC programs.


Very similar to my first intuition on how to approach this problem. An interesting question that comes to mind is how should you pick index sizes and how often should you rebalance the whole thing. Make the index too large and you're wasting a lot of space you'll never need, too small and you're forced to rebalance too often. I'm guessing an ideal index size is such that you can rebalance at most once a night with a cron job and then avoid rebalances the whole rest of the day.

To be fair, this sounds like one of those classic problems that someone for sure already figured out in the 50s or 60s, just under a different name and/or context. Hash chaining is a similar idea, but not quite the same.


The "real life" solution of leaving gaps & reindexing is actually my earliest programming memory (as a teenager)/lesson in cleverness, of feeling like I should have been able to come up with a more clever/optimal/~~scalable~~ solution but settling for this awkward 100 200 300 thing. Nowadays I've used that approach like...dozens of times over the decades of real world projects since, and I'm still maintaining that web app I made in 2003, so I'm very glad I never came up with something unduly clever.

That is how I implemented it at work around 9 years ago. Use strings for ordering and if you use the full byte range they end up fairly compact (rather than just 1-9 as in your example). There are some edge cases that can cause it to balloon in size so there is a separate reordering step but it almost never needs to be invoked.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: