For calculating the XOR of 1 to n there is a closed form solution, so no need to XOR them together in a loop.
(n & ((n & 1) - 1)) + ((n ^ (n >> 1)) & 1)
Or a much more readable version
[ n, 1, n + 1, 0 ][n % 4]
which makes it clear that this function cycles through a pattern of length four.
Why this works can be seen if we start with some n that is divisible by four, i.e. it has the two least significant bits clear, and then keep XORing it with its successors. We start with xxxxxx00 which is our n. Then we XOR it with n + 1 which is xxxxxx01 and that clears all the x's and leaves us with 00000001. Now we XOR it with n + 2 which is xxxxxx10 and that yields xxxxxx11 which is n + 3. The cycle finishes when we now XOR it it with n + 3 which yields 00000000. So we get n, 1, n + 3, 0 and then the cycle repeats as we are back at zero and at n + 4 which is again divisible by four.
My offhand solution not using xor is to subtract from the sum of 1 to n, which has a closed form solution. The closed form roughly halves the execution time, as we only have to iterate over the range once.
Good to know there's a similar speedup available on the xor path...
Another interesting fact is that each time you make the xor of four consecutive numbers, beginning with an even number, the result is zero.
Example in J.
xor =: (16 + 2b0110) b.
f =: 3 : 'xor/ y + i. 4'
f"0 ] 2 * 1 + i. 100
Yes, because XORing two consecutive integers only differing in the least significant bit yields one.
xxxxxxx0 ^ xxxxxxx1 = 00000001
Doing this twice with four consecutive numbers then also cancels the remaining one. That also means that you do not have to use consecutive numbers, you can use two arbitrary pairs
There are essentially two bits of information in the 'state' of this iterated algorithm:
a) Are all the non-lowest bits zero, or are they the value of the latest N
b) the value of the lowest bit
So the cycle of (N, 1, N+3, 0) corresponds to (A) and (B) being: (0,0), (0,1), (1,1), (1, 0) - i.e. the 4 possible combinations of these states.
If we generalize the problem to base k (they are k-1 duplicate of each number except the missing number, find missing one using base k-wise addition) then we can see the cycle is the smallest number such the base k-wise addition from 1 to the number is zero and it is power of k will form a cycle. I'm not sure if all such numbers are power of k if they exists or if there is an upper bound on them. For example in base 4 there appears to be no such cycle.
I made an arithmetical mistake in base 4, so I was wrong. I also wrote they are instead of there are.
I think the following is true: For even k the cycle is k^2 long and for odd k is k long. Why? because units' place of generalized xor from 1 to k-1 is (k^2-k)/2 and therefore zero mod k if k is odd, if k is even then if we repeat it twice we get zero. For the second digit, k times the same digit will always give zero. Thus for odd k we have a zero when n is divisible by k and for even k we have a zero when n is divisible by 2k and the smallest power of k divisible by 2k is k^2 so it must be the cycle length.
No, that is correct, those two n represent slightly different things. n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1. I thought about how I could make this less confusing but it just became more confusing in my mind, so just used n in both cases.
I'm now seeing that they're different. However, this sounds a bit off to me:
>n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1.
The reason I think it's off is that array index expression the start of the sum is 1, but in the explanation the start of the sum is n. So I don't think it's as simple as the ending being different by 2.
In the array expression the array values and the index depend on n and both vary, in the explanation the n is fixed. Let us do an example starting at say 8.
n = 8 [ 8, 1, 9, 0 ][ 8 % 4] = 8 = n
n = 9 [ 9, 1, 10, 0 ][ 9 % 4] = 1
n = 10 [ 10, 1, 11, 0 ][10 % 4] = 11 = n + 1
n = 11 [ 11, 1, 12, 0 ][11 % 4] = 0
n = 8 n = 8 = 8 = n
n ^ n + 1 = 8 ^ 9 = 1
n ^ n + 1 ^ n + 2 = 8 ^ 9 ^ 10 = 11 = n + 3
n ^ n + 1 ^ n + 2 ^ n + 3 = 8 ^ 9 ^ 10 ^ 11 = 0
So in the explanation we get 11 = n + 3 still with reference to the starting value n = 8, in the array expression on the other hand we have moved on to n = 10 when we pull 11 = n + 1 out of the array.
Amendment to the parent comment. Note that the arrays contain values like 9 and 10 that are not the XOR of 1 to n for any n but they will also never be accessed because when they appear in the array, then the index used will point to a different element.
Also because we end up back at zero when ever n % 4 == 3, there is some flexibility about the starting point. I wrote 1 to n because that is what the article used, but it would be mathematically cleaner to start at zero which actually changes nothing because XORing with zero does nothing. And we do not have to start at zero at all, we can start at any number divisible by four and less than n because the running XOR sum will become zero just before each multiple of four. So XORing together 0...n, 1...n, 4...n, 8...n or generally 4k...n will give the same result. The explanation part looked at one cycle starting at 4k and ending at 4k + 3 with the running XOR sum being back at zero. Maybe this would have been the less confusing explanation, just using 4k instead of using n again with the constraint that it is divisible by four.
There's a bit of a trick in that solution: n is assumed to have the lower two bits clear so for an arbitrary n the array would really be:
[(n & ~3), 1, (n & ~3) + 3, 0][n % 4]
where the (n & ~3) makes sure those lower 2 bits are cleared. But note that we only ever can look at the first element when n % 4 == 0. In that case, (n & ~3) == n already. And further, we only ever can look at the third element when n % 4 == 2. In that case (n & ~3) == n - 2, so (n & ~3) + 3 == n + 1. Hence the array can be simplified to the one given in the other comment.
Why this works can be seen if we start with some n that is divisible by four, i.e. it has the two least significant bits clear, and then keep XORing it with its successors. We start with xxxxxx00 which is our n. Then we XOR it with n + 1 which is xxxxxx01 and that clears all the x's and leaves us with 00000001. Now we XOR it with n + 2 which is xxxxxx10 and that yields xxxxxx11 which is n + 3. The cycle finishes when we now XOR it it with n + 3 which yields 00000000. So we get n, 1, n + 3, 0 and then the cycle repeats as we are back at zero and at n + 4 which is again divisible by four.