If you add '-ffast-math' to the compiler parameters in his godbolt link you get the 'right' answer. Doing that means the division is done with the 'divsd' instruction, which clang uses even without the 'fast-math' flag.
That doesn't mean it's necessarily the right thing to do though.
Clang can be made generate the same code as the 'broken' gcc output by declaring the variables as 'long double' instead of 'double'.
So I think this is where the issue lies?
I thought fast-math meant (among other things) that the compiler could algebraically simplify things using some rules for real numbers that don't apply to 32-/64-bit floating point numbers. I'd expect that reducing the number of operations would more often than not reduce the error due to rounding, which is what most people would want when they talk about "correctness".
Which step in this (very naive) argument is wrong?
Rounding error is not the only source of error with floating point. There is also loss of significance, which in the worst case is called catastrophic cancellation [1]. This occurs when subtracting two numbers which are very close in magnitude, for example:
1.23456789 - 1.23456788 = 0.00000001 = 1 * 10^-8
So here we’ve gone from 9 significant figures down to 1. This phenomenon will make a naïve Taylor series approximation of e^x be very inaccurate for negative x, due to the sign alternating between positive and negative on every term, causing a lot of catastrophic cancellation.
Stuff like Kahan summation breaks horribly with -ffast-math, because if you assume that sum is associative the error term, which is ((s+x)-s)-x, simplifies to zero.
Yeah, that's the obvious counterexample, but that's why I said "more often than not". The statement I was questioning was that fast-math is "normally not more correct."
One very language-lawyer way to look at it is that, even if the answer it gives is more mathematically accurate, it's still less correct in that it's not the value you literally asked for.
Imagine the analogue for integers. You might use code like this to round down to a multiple of 2:
int i = 7;
int j = (i / 2) * 2;
If a compiler optimized that to j = i then that's more mathematically accurate but less correct.
That doesn't mean it's necessarily the right thing to do though.
Clang can be made generate the same code as the 'broken' gcc output by declaring the variables as 'long double' instead of 'double'. So I think this is where the issue lies?
Edit: godbolt link requiring large monitor: https://godbolt.org/z/tFiZwW