pdqsort isn't stable and still suffers from killer inputs. Not sure how well thi...

mlochbaum · on July 10, 2021

You should explain these issues in the repository, not in Hacker News comments!

A benchmark of 100,000 32 bit integers suggests that you will be comparing against algorithms that do well on that benchmark when it's not the case at all. The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers. And I think it's an exaggeration to say pdqsort has "killer inputs". In your benchmarks the patterned data always sorts faster than random, so the most you can say is that merge sort exploits some kinds of structure better (I have seen patterns where it does worse, but not by much). I expect timsort has some similar advantages over blitsort because of its ability to find natural runs while blitsort always starts at size 32—try an up-down pattern with runs of length 50 for instance. The absolute worse case for pdqsort is a fallback to heapsort. With pivot randomization it's pretty much impossible on large arrays unless they are engineered to make pdqsort fail, and you end up sorting the array three times slower or something. Not much of a DOS attack.

Having a fast merge sort is definitely useful in a lot of situations (my thoughts at [1]), and I've been interested in quadsort particularly for that purpose (blitsort seems to emphasize lower external memory usage which I've never had a use for). Half as fast as pdqsort on random data is actually somewhat better than I expected. It feels dishonest to see "Blitsort has exceptional performance" followed by selective benchmarks showing worse algorithms, and it makes it much harder to figure out what blitsort is good for.

[1] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

yunohn · on July 10, 2021

> The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers.

If you’re considering the raw comparison value, then sure, stability isn’t applicable. But it's really rare that people are just sorting normal lists of numbers.

Stable sorts come into the picture when the comparisons are referring to records, ie rows in a DB.

scandum · on July 10, 2021

Well, blitsort's performance is exceptional in the sense that it's 15% faster than the next fastest stable in-place sort.

As for stability, it's useful when you need to sort a table with an integer index. So it might be of value to database software.

I probably should point out some of the weakness on the README.

Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.

As for runs of 50, haven't benched those, though I assume gridsort (https://github.com/scandum/gridsort) would handle those rather well.

orlp · on July 10, 2021

> Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.

This is just not true. Introsort (the current std::sort in all C++ compilers I know of) also has 'killer' inputs, and will switch to heapsort twice as slow than than pdqsort does in those. Worse, the libc++ implementation of std::sort has a quadratic killer sequence that I found 7 years ago, and it is still not fixed: https://bugs.llvm.org/show_bug.cgi?id=20837. In my opinion there is no good reason that pdqsort is not adopted yet as std::sort as it is faster for many patterns, faster for random input and rarely if ever slower.

In fact, pdqsort is the standard unstable sorting algorithm in Rust: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...

Note that any deterministic quicksort variant that does not spend a lot of time on choosing its pivots will always have some 'killer' pattern that forces it to use its fallback algorithm (or worse, go O(n^2)), as you can always shuffle the input array such that poor pivots are chosen. What pdqsort does is shuffle things around so that those 'killer' patterns are a lot less trivial, and are not as likely to occur in real-world input. For me determinism was very important so I didn't do it in https://github.com/orlp/pdqsort, but note that the variant implemented in Rust uses non-deterministic swapping and thus it is impossible to force a 'killer' input against it.

scandum · on July 10, 2021

Given how fast pdq is there's indeed no reason not to adopt it. Probably the usual 'not invented here' mindset.

mlochbaum · on July 10, 2021

I guess that makes sense, but I never would have guessed that the category is in-place stable algorithms as it's only mentioned in the "about" line and not the text itself.

Sorting on an integer key isn't the same as sorting a list of integers. They have different performance considerations.

scandum · on July 12, 2021

I updated the README slightly, it's mentioned twice now. :)

As for performance considerations, blitsort's performance on integers is indicative of its performance on strings, while this isn't the case for pdqsort.

I haven't ran a specific benchmark, but I assume that on string data blitsort will beat pdqsort for every metric except random with many equal items.

orlp · on July 10, 2021

What do you mean by "suffers from killer inputs"? It is always O(n log n) by falling back to heapsort, so I don't think any input can be described as a "killer".