Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

pdqsort isn't stable and still suffers from killer inputs.

Not sure how well this will paste:

      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution
  -------- | -------- | ---- | -------- | -------- | --------- | ------- | ----------------
  blitsort |   100000 |   32 | 0.005962 | 0.006532 |         1 |     100 |     random order
   pdqsort |   100000 |   32 | 0.002665 | 0.002802 |         1 |     100 |     random order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000069 | 0.000078 |         1 |     100 |  ascending order
   pdqsort |   100000 |   32 | 0.000098 | 0.000100 |         1 |     100 |  ascending order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.001138 | 0.001196 |         1 |     100 |    ascending saw
   pdqsort |   100000 |   32 | 0.003245 | 0.003317 |         1 |     100 |    ascending saw
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.003777 | 0.003843 |         1 |     100 |    generic order
   pdqsort |   100000 |   32 | 0.000819 | 0.000851 |         1 |     100 |    generic order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000051 | 0.000060 |         1 |     100 | descending order
   pdqsort |   100000 |   32 | 0.000202 | 0.000207 |         1 |     100 | descending order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000815 | 0.000845 |         1 |     100 |   descending saw
   pdqsort |   100000 |   32 | 0.002307 | 0.002368 |         1 |     100 |   descending saw
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.001761 | 0.001774 |         1 |     100 |      random tail
   pdqsort |   100000 |   32 | 0.002560 | 0.002572 |         1 |     100 |      random tail
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.003328 | 0.003374 |         1 |     100 |      random half
   pdqsort |   100000 |   32 | 0.002651 | 0.002854 |         1 |     100 |      random half
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000593 | 0.000938 |         1 |     100 |  ascending tiles
   pdqsort |   100000 |   32 | 0.002316 | 0.002482 |         1 |     100 |  ascending tiles


You should explain these issues in the repository, not in Hacker News comments!

A benchmark of 100,000 32 bit integers suggests that you will be comparing against algorithms that do well on that benchmark when it's not the case at all. The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers. And I think it's an exaggeration to say pdqsort has "killer inputs". In your benchmarks the patterned data always sorts faster than random, so the most you can say is that merge sort exploits some kinds of structure better (I have seen patterns where it does worse, but not by much). I expect timsort has some similar advantages over blitsort because of its ability to find natural runs while blitsort always starts at size 32—try an up-down pattern with runs of length 50 for instance. The absolute worse case for pdqsort is a fallback to heapsort. With pivot randomization it's pretty much impossible on large arrays unless they are engineered to make pdqsort fail, and you end up sorting the array three times slower or something. Not much of a DOS attack.

Having a fast merge sort is definitely useful in a lot of situations (my thoughts at [1]), and I've been interested in quadsort particularly for that purpose (blitsort seems to emphasize lower external memory usage which I've never had a use for). Half as fast as pdqsort on random data is actually somewhat better than I expected. It feels dishonest to see "Blitsort has exceptional performance" followed by selective benchmarks showing worse algorithms, and it makes it much harder to figure out what blitsort is good for.

[1] https://mlochbaum.github.io/BQN/implementation/primitive/sor...


> The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers.

If you’re considering the raw comparison value, then sure, stability isn’t applicable. But it's really rare that people are just sorting normal lists of numbers.

Stable sorts come into the picture when the comparisons are referring to records, ie rows in a DB.


Well, blitsort's performance is exceptional in the sense that it's 15% faster than the next fastest stable in-place sort.

As for stability, it's useful when you need to sort a table with an integer index. So it might be of value to database software.

I probably should point out some of the weakness on the README.

Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.

As for runs of 50, haven't benched those, though I assume gridsort (https://github.com/scandum/gridsort) would handle those rather well.


> Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.

This is just not true. Introsort (the current std::sort in all C++ compilers I know of) also has 'killer' inputs, and will switch to heapsort twice as slow than than pdqsort does in those. Worse, the libc++ implementation of std::sort has a quadratic killer sequence that I found 7 years ago, and it is still not fixed: https://bugs.llvm.org/show_bug.cgi?id=20837. In my opinion there is no good reason that pdqsort is not adopted yet as std::sort as it is faster for many patterns, faster for random input and rarely if ever slower.

In fact, pdqsort is the standard unstable sorting algorithm in Rust: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...

Note that any deterministic quicksort variant that does not spend a lot of time on choosing its pivots will always have some 'killer' pattern that forces it to use its fallback algorithm (or worse, go O(n^2)), as you can always shuffle the input array such that poor pivots are chosen. What pdqsort does is shuffle things around so that those 'killer' patterns are a lot less trivial, and are not as likely to occur in real-world input. For me determinism was very important so I didn't do it in https://github.com/orlp/pdqsort, but note that the variant implemented in Rust uses non-deterministic swapping and thus it is impossible to force a 'killer' input against it.


Given how fast pdq is there's indeed no reason not to adopt it. Probably the usual 'not invented here' mindset.


I guess that makes sense, but I never would have guessed that the category is in-place stable algorithms as it's only mentioned in the "about" line and not the text itself.

Sorting on an integer key isn't the same as sorting a list of integers. They have different performance considerations.


I updated the README slightly, it's mentioned twice now. :)

As for performance considerations, blitsort's performance on integers is indicative of its performance on strings, while this isn't the case for pdqsort.

I haven't ran a specific benchmark, but I assume that on string data blitsort will beat pdqsort for every metric except random with many equal items.


What do you mean by "suffers from killer inputs"? It is always O(n log n) by falling back to heapsort, so I don't think any input can be described as a "killer".




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: