> ?!? Why bother defining a framework for measurement if you're just going to substitute your own judgment? Perhaps the framework didn't capture something important -- like the absolute number of people looking at any given post, concurrent updates, etc.
Just guessing: the benchmark tells you the time complexity. It doesn't tell you the space complexity. The author is optimizing between the time- and space-complexity of the solutions, with the time-complexity benchmarks as an input. (Also, at a scale the benchmark doesn't reach, space-complexity starts to affect time-complexity, as large datasets become less able to be hot in disk cache.)
That sounds like it's a bad benchmark, then? I mean, it's still interesting, but a synthetic benchmark that fails to map to real use is... I guess actually negative value since it's misleading.
Yeah, this is probably the reason and the author even elaborates on this in the sentences following the bit quoted by cldellow:
> Even though the data says hstore, knowing that posts will be seen by more and more people over time, I might choose the HLL solution for an actual implementation. It's far less likely to pose a bloated row problem, [...]
Just posting here too, but yup this is exactly what I was trying to convey.
Hstore might have been the fastest but the way I am using it or the scales the use case could scale to might not work out.
Bigger benchmarks could have been done! Maybe a multi part post would have been better so I could split apart methodology and results/blabbering about approach!
Just guessing: the benchmark tells you the time complexity. It doesn't tell you the space complexity. The author is optimizing between the time- and space-complexity of the solutions, with the time-complexity benchmarks as an input. (Also, at a scale the benchmark doesn't reach, space-complexity starts to affect time-complexity, as large datasets become less able to be hot in disk cache.)