Hacker News new | past | comments | ask | show | jobs | submit login

This is unrelated but I wonder what GCC's new platform specific-method generator would do to this section of code [0]. There are a few other bits in here that look like all vector-ops and I'd love to see what would happen to this after the compiler get's a crack at it.

Also if someone could tell me if this is a stupid idea: what would be the result of writing a version of this sort as a const-expr for fixed size arrays. For instacne pdqsort512 to sort a 512 length set. Just a thought. I'd love to see what G++ is able to hob-cobble together.

[0] - https://github.com/orlp/pdqsort/blob/master/pdqsort.h#L227-L...




It's not a vector operation, even though it's written in a fashion similar to it. However, it can be implemented in a very interesting fashion.

Consider when c0, c3 and c6 are true, and the rest are false. Then the end result is this:

    offsets_l[num_l + 0] = i + 0;
    offsets_l[num_l + 1] = i + 3;
    offsets_l[num_l + 2] = i + 6;
    num_l += 3;
On a little-endian machine we could write this as following (assuming there is space after offsets_l):

    *(uint64_t*) (offsets_l + num_l) = (i + 0) + ((i + 3) << 8) + ((i + 6) << 16);
Which is equivalent to:

    *(uint64_t*) (offsets_l + num_l) = (i + 0) + (i + 3)*256 + (i + 6)*65536;
And if we regroup the parenthesis and add common factors:

    *(uint64_t*) (offsets_l + num_l) = 65793 * i + 393984;
If we precalculate these magic constants for each combination of c0..7 we get a 256-element lookup table, and code that looks like this:

    int idx = c0 | c1 << 1 | c2 << 2 | c3 << 3 | c4 << 4 | c5 << 5 | c6 << 6 | c7 << 7;
    *(uint64_t*) (offsets_l + num_l) = muls[idx] * i + adders[idx];
    num_l += popcnt[idx];
Sadly, it turns out that this approach isn't faster. It's cool nonetheless.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: