Blitsort is an in-place stable adaptive rotate merge sort

mosaic_school · on July 10, 2021

Love this!

Functions look clean and fas Also great README (documentation, evaluation, ..)

I did not see a license file though. Is the repository intended as public domain?

UPDATE: thanks, apparently I'm trained to skip file head sections.

detaro · on July 10, 2021

All the source files have a license in them?

kzrdude · on July 10, 2021

See the source files for a license.

kzrdude · on July 10, 2021

It's using a qsort-like interface using a function pointer for the comparison - surely a C++ like implementation has a lot to gain here?

scandum · on July 10, 2021

Correct, if you uncomment

  //#define cmp(a,b) (*(a) > *(b))

in blitsort.h it'll run about 25% faster. Probably still slower than a native C++ implementation since it'll evaluate to (a > b) > 0 rather than (a > b), not sure if the compiler will optimize that.

amelius · on July 10, 2021

As a layman I'd expect advanced sorting algorithms to depend on the dimensions of the underlying memory architecture, e.g. size of CPU caches, and memory latency, bandwidth and such.

beeforpork · on July 10, 2021

What is the license for this code?

hypertele-Xii · on July 10, 2021

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

StefanKarpinski · on July 10, 2021

I.e. MIT license

scandum · on July 10, 2021

Might be worth noting that it's faster than quicksort, std::stable_sort, and timsort.

mlochbaum · on July 10, 2021

These are all pretty slow on 32-bit integers, which you're using for testing. Could you please include a comparison with pdqsort, or at least mention that pdqsort is to be preferred when an element fits in a register (as it certainly will be if you're only beating timsort by 20% or so)? It's frustrating to see sorting algorithms with some effort spent on benchmarking and still have to guess at whether it's practical based on how much slower the comparison points are than the state of the art.

Radix sorts (like your own wolfsort!) will also be much faster on the benchmarks presented, although that's sort of misleading since they are very good at uniformly random input and not so good at other distributions.

scandum · on July 10, 2021

pdqsort isn't stable and still suffers from killer inputs.

Not sure how well this will paste:

      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution
  -------- | -------- | ---- | -------- | -------- | --------- | ------- | ----------------
  blitsort |   100000 |   32 | 0.005962 | 0.006532 |         1 |     100 |     random order
   pdqsort |   100000 |   32 | 0.002665 | 0.002802 |         1 |     100 |     random order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000069 | 0.000078 |         1 |     100 |  ascending order
   pdqsort |   100000 |   32 | 0.000098 | 0.000100 |         1 |     100 |  ascending order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.001138 | 0.001196 |         1 |     100 |    ascending saw
   pdqsort |   100000 |   32 | 0.003245 | 0.003317 |         1 |     100 |    ascending saw
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.003777 | 0.003843 |         1 |     100 |    generic order
   pdqsort |   100000 |   32 | 0.000819 | 0.000851 |         1 |     100 |    generic order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000051 | 0.000060 |         1 |     100 | descending order
   pdqsort |   100000 |   32 | 0.000202 | 0.000207 |         1 |     100 | descending order
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000815 | 0.000845 |         1 |     100 |   descending saw
   pdqsort |   100000 |   32 | 0.002307 | 0.002368 |         1 |     100 |   descending saw
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.001761 | 0.001774 |         1 |     100 |      random tail
   pdqsort |   100000 |   32 | 0.002560 | 0.002572 |         1 |     100 |      random tail
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.003328 | 0.003374 |         1 |     100 |      random half
   pdqsort |   100000 |   32 | 0.002651 | 0.002854 |         1 |     100 |      random half
           |          |      |          |          |           |         |
  blitsort |   100000 |   32 | 0.000593 | 0.000938 |         1 |     100 |  ascending tiles
   pdqsort |   100000 |   32 | 0.002316 | 0.002482 |         1 |     100 |  ascending tiles

mlochbaum · on July 10, 2021

You should explain these issues in the repository, not in Hacker News comments!

A benchmark of 100,000 32 bit integers suggests that you will be comparing against algorithms that do well on that benchmark when it's not the case at all. The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers. And I think it's an exaggeration to say pdqsort has "killer inputs". In your benchmarks the patterned data always sorts faster than random, so the most you can say is that merge sort exploits some kinds of structure better (I have seen patterns where it does worse, but not by much). I expect timsort has some similar advantages over blitsort because of its ability to find natural runs while blitsort always starts at size 32—try an up-down pattern with runs of length 50 for instance. The absolute worse case for pdqsort is a fallback to heapsort. With pivot randomization it's pretty much impossible on large arrays unless they are engineered to make pdqsort fail, and you end up sorting the array three times slower or something. Not much of a DOS attack.

Having a fast merge sort is definitely useful in a lot of situations (my thoughts at [1]), and I've been interested in quadsort particularly for that purpose (blitsort seems to emphasize lower external memory usage which I've never had a use for). Half as fast as pdqsort on random data is actually somewhat better than I expected. It feels dishonest to see "Blitsort has exceptional performance" followed by selective benchmarks showing worse algorithms, and it makes it much harder to figure out what blitsort is good for.

[1] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

yunohn · on July 10, 2021

> The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers.

If you’re considering the raw comparison value, then sure, stability isn’t applicable. But it's really rare that people are just sorting normal lists of numbers.

Stable sorts come into the picture when the comparisons are referring to records, ie rows in a DB.

scandum · on July 10, 2021

Well, blitsort's performance is exceptional in the sense that it's 15% faster than the next fastest stable in-place sort.

As for stability, it's useful when you need to sort a table with an integer index. So it might be of value to database software.

I probably should point out some of the weakness on the README.

Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.

As for runs of 50, haven't benched those, though I assume gridsort (https://github.com/scandum/gridsort) would handle those rather well.

orlp · on July 10, 2021

> Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.

This is just not true. Introsort (the current std::sort in all C++ compilers I know of) also has 'killer' inputs, and will switch to heapsort twice as slow than than pdqsort does in those. Worse, the libc++ implementation of std::sort has a quadratic killer sequence that I found 7 years ago, and it is still not fixed: https://bugs.llvm.org/show_bug.cgi?id=20837. In my opinion there is no good reason that pdqsort is not adopted yet as std::sort as it is faster for many patterns, faster for random input and rarely if ever slower.

In fact, pdqsort is the standard unstable sorting algorithm in Rust: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...

Note that any deterministic quicksort variant that does not spend a lot of time on choosing its pivots will always have some 'killer' pattern that forces it to use its fallback algorithm (or worse, go O(n^2)), as you can always shuffle the input array such that poor pivots are chosen. What pdqsort does is shuffle things around so that those 'killer' patterns are a lot less trivial, and are not as likely to occur in real-world input. For me determinism was very important so I didn't do it in https://github.com/orlp/pdqsort, but note that the variant implemented in Rust uses non-deterministic swapping and thus it is impossible to force a 'killer' input against it.

scandum · on July 10, 2021

Given how fast pdq is there's indeed no reason not to adopt it. Probably the usual 'not invented here' mindset.

mlochbaum · on July 10, 2021

I guess that makes sense, but I never would have guessed that the category is in-place stable algorithms as it's only mentioned in the "about" line and not the text itself.

Sorting on an integer key isn't the same as sorting a list of integers. They have different performance considerations.

scandum · on July 12, 2021

I updated the README slightly, it's mentioned twice now. :)

As for performance considerations, blitsort's performance on integers is indicative of its performance on strings, while this isn't the case for pdqsort.

I haven't ran a specific benchmark, but I assume that on string data blitsort will beat pdqsort for every metric except random with many equal items.

orlp · on July 10, 2021

What do you mean by "suffers from killer inputs"? It is always O(n log n) by falling back to heapsort, so I don't think any input can be described as a "killer".

jtc331 · on July 10, 2021

On certain data sets.

CJefferson · on July 10, 2021

I wonder why they don't compare against std::sort? That seems like most people's standard choice.

lpapez · on July 10, 2021

Blitsort supports long doubles and 8, 16, 32, and 64 bit data types. By using 32 or 64 bit pointers it's possible to sort any other data type.

Seems quite limited, wonder how it compares to radix sort.

bradleyjg · on July 10, 2021

Radix sort wouldn’t allow for the “possible to sort any other data type” part.

I do wonder about not enabling inlining structs. I don’t know where the crossover happens, and it certainly varies with hardware characteristics, but I’m sure a data structure made up of say an int and two longs would better be sorted in contiguous memory rather than as an array of pointers to random places in the heap.

lpapez · on July 10, 2021

Exactly. It has the same limitations as a radix sort, which is why I find them comparable

bradleyjg · on July 10, 2021

For a radix sort it’s impossible to sort linked data structures, at least as I understand it.

Whereas any comparison based sort can follow the pointers and do arbitrary comparisons.

acmj · on July 10, 2021

In the source code, it is not apparent where the size restriction comes from. It seems to me that it can be easily modified to work with any data types. Maybe I am wrong. I am more concerned with this sentence:

> Blitsort's performance is similar to that of quadsort as long as the auxiliary memory is greater or equal to the square root of the array being sorted, which comes out at 262,144 elements with the default stack of 512 elements. Performance on larger arrays degrades marginally.

I wonder what "marginally" means here. What if we sort 10 million integers?

nuclearnice3 · on July 10, 2021

Is the included bench.c the right way to answer that question?

  1m random dist size 128 avg qsort 0.206549 blitsort 0.281630
  10m random dist size 32 avg qsort 2.963479 blitsort 4.394143
  100m random dist size 128 avg qsort 34.996847 blitsort 54.102616

scandum · on July 10, 2021

It is, those numbers are less favorable than they are on my own machine.

The speed of your RAM memory is likely to have an influence on performance. My system is running at 2133MHz.

nuclearnice3 · on July 11, 2021

Interesting.

  16 GB 2133 MHz LPDDR3

Built with Apple clang version 12.0.5 (clang-1205.0.22.11) gcc -O3 bench.c

scandum · on July 10, 2021

Blitsort can sort strings, but something like a 12 byte data structure would require an array with pointer references to sort.

As for the degradation against std:stable_sort:

5% slower at 1 million, 10% slower at 10 million, 20% slower at 100 million.

With sqrt n auxiliary you're looking at 2%, 4%, 6% slower.

Against qsort() it remains faster at 10 million, 3% slower at 100 million.

acmj · on July 10, 2021

I am thinking more about sorting, say, 20-byte or 24-byte custom structs. Directly sorting arrays is probably faster than sorting pointers to arrays. At least in my applications, I rarely just sort numbers; I more often sort numbers along with other data associated with the numbers.

scandum · on July 10, 2021

Hard to say which would be faster, I've made a mental note to benchmark sorting 16 byte long doubles through pointers as well as directly.

I never tried, but it should be possible (and relatively easy) to add custom sizes in the .h file.