> You're ignoring real-world aspects of the problem. No, I'm not. The most i...

tjsnyder · on Feb 11, 2011

This is absolutely correct. The worst case in any binary search is exactly floor(lg n + 1) where the element doesn't exist. There is no reason to optimize any further than this.

jemfinch · on Feb 11, 2011

To be clear, interpolation search can cut down the runtime by another log factor (`O(log log n)` instead of `O(log n)`) but it's just not worth the effort, especially in this case, precisely because the lines aren't uniformly distributed.

e-dard · on Feb 12, 2011

Well clearly they're more likely to hire someone who does go to the effort!

> To be clear, interpolation search can cut down the runtime by another log factor (`O(log log n)` instead of `O(log n)`) but it's just not worth the effort, especially in this case, precisely because the lines aren't uniformly distributed.

It doesn't matter what the distribution is as long as you can learn it. Uniform helps when you are cutting down the search space in regular chunks, sure. The Reddit data wont be uniform - so don't cut the search space uniformly.

As I said, something faster could be conceived if one studies the average number of visitors over a 24 hour period (I don't know where this data could be gathered from).

If you can make your first seek very accurate, then it may be the case that simple linear search from there will get you there faster than Binary Search.

You can still report average and worst-case complexities under the assumption you have taken about how the log data is distributed.

Sorry to keep going on about this, but I just wanted to make my point clear! Good discussion.

e-dard · on Feb 12, 2011

OK, you are simply proposing the most obvious solution to the problem, which is entirely correct - I must have misunderstood your reasoning.

I assumed Reddit would want something a bit more special that the hundreds of Binary Search implementations they are going to receive. There are always faster algorithms that can be designed for specific problems, even if realistically they only save (in this case) 100ms.

It's a competition - you're supposed to stand out.