Hacker News new | past | comments | ask | show | jobs | submit login
GNU GCC does not round floating-point divisions to the nearest value (lemire.me)
158 points by guiambros on June 27, 2020 | hide | past | favorite | 79 comments



An example where this matters -- a new Mario 64 trick on the Virtual Console version because summing round-towards-zero numbers over a sine wave cycle produces a small delta which builds up over a couple of days to make a jump possible

https://www.kotaku.com.au/2018/06/the-mario-64-trick-that-ta...


For anyone else interested in similar tricks, here is an in-depth video where someone talks about getting a star with half a button press. It starts out straightforward enough, but eventually gets really interesting when they start talking about parallell worlds that you can only access by abusing bugs to reach unrealistic speeds.

https://www.youtube.com/watch?v=kpk2tdsPh0A&t=77s


Strongly recommended watching, in my opinion: very clearly describes some quite complicated manipulations, the circumstances in which they work and how to move between those circumstances, to achieve what should be impossible.

If you're going to watch one commentated "tool-assisted" run, this is it.


Best followed by this video, which made me laugh my ass off for several minutes: https://www.youtube.com/watch?v=hSa-pwML4z4

(I grew up watching SpongeBob, if that matters)


That was excellent. If I may suggest a followup: https://youtu.be/Pyn3N55elS4


A small clarification: this trick is not new. It was discovered very early on in the investigation of the Wii VC release of Super Mario 64. It was rediscovered by a group of people who were interested in an application of it (which enables one to beat SM64 without the A button) a couple of years ago, and the discovery that the glitch is a result of inaccurate rounding mode emulation followed a couple weeks later.


Thank you for the clarification.


Fun idea: If the universe is a simulation, then maybe dark energy is just a rounding error bug.


According to Wikipedia:

   ..dark energy contributes 68% of the total energy..
so if anyone complains about the shoddy quality of x87 floats, point them to the universe and tell them it could be worse.


[flagged]


An example of real life software where this the difference has a noticeable impact.


I believe the issue is that some of this ends up evaluated at compile-time, and the compiler really has no idea how to do rounding correctly at that point. When implementing floating point tests for an x86 emulator, we had to create a macro that would ensure that the calculation was done at runtime rather than at compile time for this exact reason: https://github.com/ish-app/ish/blob/18176b6931d69bdabe48f137.... If anyone knows how to control floating point rounding in C reliably, I'd be interesting in hearing it :)


> If anyone knows how to control floating point rounding in C reliably, I'd be interesting in hearing it

I skimmed over that code and it seems that hack is required because a) you're not happy with what the compiler does with a double b) insisting on using a "double" type.

It's easy to get your own semantics for numbers in C, it's not easy to do so while insisting on using C's own types whose behavior are contrary to what you want.

If you want your own floating point semantics why not make a struct with a significand & base and pass that around? Something like that is how established numbers-in-C-without-native-C-semantics libraries do it, e.g. GMP.


> the compiler really has no idea how to do rounding correctly at that point.

Flags, pragmas, function attributes, there are so many ways this could be communicated to the compiler. If this behavior is not a bug, then having a way to change it should be a feature.


It's been a while since he's done that sort of thing, but Bruce Dawson could probably tell you some stories. I don't remember seeing that specific problem discussed, but his blog is a treasure trove of floating point information: https://randomascii.wordpress.com/category/floating-point/


If you add '-ffast-math' to the compiler parameters in his godbolt link you get the 'right' answer. Doing that means the division is done with the 'divsd' instruction, which clang uses even without the 'fast-math' flag.

That doesn't mean it's necessarily the right thing to do though.

Clang can be made generate the same code as the 'broken' gcc output by declaring the variables as 'long double' instead of 'double'. So I think this is where the issue lies?

Edit: godbolt link requiring large monitor: https://godbolt.org/z/tFiZwW


-ffast-math should rebrand itself to -ffast-and-more-correct-math ;)


It’s normally not more correct.


I thought fast-math meant (among other things) that the compiler could algebraically simplify things using some rules for real numbers that don't apply to 32-/64-bit floating point numbers. I'd expect that reducing the number of operations would more often than not reduce the error due to rounding, which is what most people would want when they talk about "correctness".

Which step in this (very naive) argument is wrong?


Rounding error is not the only source of error with floating point. There is also loss of significance, which in the worst case is called catastrophic cancellation [1]. This occurs when subtracting two numbers which are very close in magnitude, for example:

1.23456789 - 1.23456788 = 0.00000001 = 1 * 10^-8

So here we’ve gone from 9 significant figures down to 1. This phenomenon will make a naïve Taylor series approximation of e^x be very inaccurate for negative x, due to the sign alternating between positive and negative on every term, causing a lot of catastrophic cancellation.

[1] https://en.wikipedia.org/wiki/Loss_of_significance


(Note that in your example no accuracy is lost)


Stuff like Kahan summation breaks horribly with -ffast-math, because if you assume that sum is associative the error term, which is ((s+x)-s)-x, simplifies to zero.

https://en.m.wikipedia.org/wiki/Kahan_summation_algorithm


Yeah, that's the obvious counterexample, but that's why I said "more often than not". The statement I was questioning was that fast-math is "normally not more correct."


That's where you dust off volatile.


Sadly true. That's the only portable way I found to implement something akind to a Kahan sum so that no compiler / flag would destroy it.


One very language-lawyer way to look at it is that, even if the answer it gives is more mathematically accurate, it's still less correct in that it's not the value you literally asked for.

Imagine the analogue for integers. You might use code like this to round down to a multiple of 2:

    int i = 7;
    int j = (i / 2) * 2;
If a compiler optimized that to j = i then that's more mathematically accurate but less correct.


The code lacking -ffast-math doing math incorrectly, and -ffast-math ending up doing it correctly by accident.


;)


As I understand it, the IEEE floating point standards do not require exact rounding for binary to decimal conversions. See What Every Computer Scientist Should Know About Floating-Point Arithmetic https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...

The behavior appears to be consistent with the standard. Keep in mind that GCC needs to provide the same result irrespective of the architecture it runs on and one reason to use GCC for floating point is to take advantage of dedicated hardware. And the reason to use floating point is speed not precision (at least when using decimal floating point). If precision is critical then arbitrary precision or fixed point is the way to go. Normally it isn’t that critical.


Look for the heading, “The IEEE Standard” in “What Every Computer Scientist Should Know About Floating-Point Arithmetic,” then scroll down to the subheading “Operations.” That section begins, “The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded. That is, the result must be computed exactly and then rounded to the nearest floating-point number (using round to even).”

A few paragraphs later, it adds, “There is not complete agreement on what operations a floating-point standard should cover. In addition to the basic operations +, -, × and /, the IEEE standard also specifies that square root, remainder, and conversion between integer and floating-point be correctly rounded. It also requires that conversion between internal formats and decimal be correctly rounded (except for very large numbers).”

I will admit that I worked as a programmer for several years without knowing that. I learned it from a former physics professor who has now moved on to become a quant with a hedge fund. He usually didn’t worry much about it, but when he needed to be sure a particular calculation was as accurate as possible, he’d start counting individual operations and stay away from particular function calls.


Yes, you are correct. Thanks. This stuff is hard for me.

Reading down thread, I see that the behavior might be related to x87 since the build script references i386...which makes IEEE 754 conformance moot. The methods of achieving conformance when targeting i386 are described in the GCC documentation.


The article is not about binary to decimal conversion. It's about rounding the result of a division, all in binary.

A decimal equivalent would be like saying 200/3 = 66, so rounded towards zero instead of nearest. That is usually not problematic, but it can be in some situations, and in some of those, rounding to nearest can mitigate (for example, when you sum a large number of division results).

The author points out that the behaviour should be configurable, but gcc doesn't seem to honor that.


x87 is crazy. Especially since you can’t easily know when something is truncated from 80 bits to 64. Add a parameter to a function and an unrelated calculation behaves differently because it now ran out of 80bit registers. Congratulations, you now have a spurious test failure you will spend weeks trying to find.

The only way of doing FP on x86 and keeping your sanity is by making sure you use 32/64 bit floats 100% of the time. That is: use SSE registers and never the 80-bit x87 arithmetic.

I thought compilers for 64 bit programs these days did exactly this. I know the .NET Jit does for example.


The article is about 32 bit gcc. 64 bit gcc defaults to SSE, although you can force x87 if you want to. (although... maybe don't do that)


I don't think any serious compiler for AMD64 is using x87 - it's ancient history. You would have to go deliberately far out of your way to encounter this problem.


If you use long double on x86-64 Linux, you're using x87. Though that has none of the problems discussed here, as there's no rounding when spilling regs to memory.


I think using long double counts as going 'deliberately far out of your way'.


Perhaps the title should mention that this is 32-bit x86, which is rather obsolete. This should work okay on x86_64.


Why would there be any difference?


Because every x86_64 CPU supports SSE2, so compilers can assume it exists and the ABI passes floating point arguments in SSE2’s FP registers. This means that every sane compiler for x86_64 will use SSE2 and won’t use x87, and the problem won’t occur.

In contrast, the 32-bit C ABI passes floating point args in x87 registers, and there are 32-bit CPUs without SSE2, so x87 is used by default.

x87 is, in quite a few respects, awful.


For the Linux kernel, things get slightly more complicated with -mgeneral-regs-only. (https://bugs.llvm.org/show_bug.cgi?id=30792 / https://reviews.llvm.org/D38479).

A few drivers use floating point values/calculations. Which way are they rounded when expressions are folded at compile time? Does that match what would happen at runtime?

The x86 kernel also disables SSE generally (kernel_fpu_{begin|end} via -mno-{x87|sse|sse2|....}. This generally lowers the overhead of context switching as there's fewer registers to save+restore, nowadays the SIMD vectors on x86 are huge.

Further, it uses an 8B stack alignment, which makes it so that when FP is used, the compiler cannot select instructions that require 16B aligned operands to be loaded/stored from/to the stack.

Finally, -Ofast implies -funsafe-math. And that's not "fun safe math." Actually had folks at work use "-Ofast" because "why wouldn't I want my code to be fast?" then ask why their reciprocals were also wrong.


> For the Linux kernel, things get slightly more complicated with -mgeneral-regs-only. (https://bugs.llvm.org/show_bug.cgi?id=30792 / https://reviews.llvm.org/D38479).

That looks like an ARM issue.

> Further, it uses an 8B stack alignment, which makes it so that when FP is used, the compiler cannot select instructions that require 16B aligned operands to be loaded/stored from/to the stack.

On x86_64, on a recent GCC, this all works correctly. If you end up with a 16-byte-aligned variable (like an SSE vector) on the stack, GCC will correctly align the stack in the prologue. On older GCCs, there were a serious of obnoxious I-know-better-than-you-isms in the command line option handling that made this all malfunction.

I seem to be missing the Share button on godbolt.org, so no link. But you can test with:

    typedef int v4si __attribute__ ((vector_size (16)));
    v4si *v;
    
    int func(void)
    {
        v4si z;
        v = &z;
    
        return 0;
    }
Add -mpreferred-stack-boundary=3 to the options.


My guess is that they’re using long double, and the abi definition of long double may be different on x86_64 (eg on windows long double is logically just double)


Are you sure? The FPU instruction set is the same.


Yes. Toggle -m32 (compiles to 32 bit) or deleting it. (defaults to 64 bit, because it's not 2005 anymore) It also works with -m32 -mfpmath=sse, even though the author of the article incorrectly states otherwise.

https://godbolt.org/z/py3Dw0


The platform ABI can differ on the same cpu.

I originally had the wrong answer here, but I'm including below for historical purposes and embarrassment. Re-reading on my laptop meant I didn't misread/read-the-expected?

Basically by default when compiling for IA32 gcc assumes that there is no SSE unit, and so must use the x87 fpu.

Alas the x87 unit doesn't have either 32 or 64 bit ieee floating point types, so the calculation at runtime must be rounded to float/double at the end of the computation. That results in a double round that is incorrect.

Minor additional comment - x87's reduced precision modes (that reduce mantissa precision to ieee754-32/64) don't reduce the exponent size so it's conceivable you could get multiple-rounding derived errors even then. But no one would be toggling x87 state like that anyway, so theoretically not something that would matter in most cases.

Old incorrect answer:

On ia32 long double is the x87 80bit ieee754 numeric type.

On x86_64 on windows, and maybe Linux, long double is just a 64bit ieee float.

So my guess is that GCC computes constant evaluation of floating point expressions in “long double” for maximum precision, and then rounds to the required precision at the end. This is not valid going from 64->32 bit floats and you could probably induce this bug if you tried hard enough.

Anyway it is not sound to round twice in any numeric operation, and by doing the calculation at one precision and then rounding that to the required precision that is what gcc is doing.

This is a bug in GCC, floating point is well defined, there isn’t randomness to it, and they’ve introduced additional rounding that is not valid.


> Anyway it is not sound to round twice in any numeric operation, and by doing the calculation at one precision and then rounding that to the required precision that is what gcc is doing. > > This is a bug in GCC, floating point is well defined, there isn’t randomness to it, and they’ve introduced additional rounding that is not valid.

Except the C standard has a way to communicate how the extra rounding is being introduced, and it's not random (for C, not C++).

There is simply no way around this issue with the x87 instruction set, GCC (again only for C only, not C++) implements FLT_EVAL_METHOD==2 which is the most sensible that can be implemented with decent performance and without breaking 80-bit long double.

If you are sure you don't need long doubles, use -mpc64.


as my edit said the problem was gcc was confined to using x87 as the compile args required x87 - the older version of the comment incorrectly thought it was the optimizer going wrong, rather than runtime.

If you are constrained to x87 your only option is multiple roundings which is bad. Ideally you'd just keep processing in the x87 stack as long as possible, but that would result in a different version of "incorrect" results.

Honestly if you're a compiler required to use x87 with non-80bit floating point I'm not sure how you can do the right thing :-/

I don't know what gcc has but llvm has a full software floating point implementation to handle cross compiling so it can match native arithmetic.


Yes, FLT_EVAL_METHOD 2 is an extension of "keep values on the stack as much as possible" that also requires the compiler to spill intermediate results as long doubles. Apparently it also doesn't cover constant folding, but knowing who implemented it in GCC (Joseph Myers) he's extremely thorough and I have no doubt it's allowed by the standard.

That's the best they can do.


yeah, once you are restricted to performing (at runtime) computation at a precision that does not match the specified storage precision, the compiler is kind of screwed :-/


Per the sysv abi[1] (bottom of page 11):

> The long double type uses a 15 bit exponent, a 64-bit mantissa with an explicit high order significant bit and an exponent bias of 16383.

So it is an 80-bit float.

https://uclibc.org/docs/psABI-x86_64.pdf


The question was "The FPU instruction set is the same", to which the response my response is correct, the same architecture can have different ABIs on different systems.

For example on osx it is 80bit ieee, but on win64 it is a 64bit ieee.

That ABI difference means that having the same instruction set is not the only thing that matters.


I just checked on the linked Compiler Explorer session [0], and if you take out the -m32 then GCC does indeed use divsd instead of fdiv.

[0] https://godbolt.org/z/py3Dw0


Who's going to use FPU on AMD64? Why on earth would you do that?


This seems like a property of the Intel 8087 math coprocessor and its distant descendants found in x86 chips, not anything to do with gcc?


My understanding is that in this case it's GCC's optimizer evaluating the expression at compile time in a way that gives a different result to if it's done at runtime.

EDIT: GCC gets it wrong with -O0 (when it's evaluated at runtime) and right with -O2 (when it evaluates it at compile time)

Clang appears to get it right when evaluated at compile time or runtime.

Clang uses the divsd instruction; GCC uses the fdiv instruction – so this really is SSE vs x87 FPU.


clang gets it "right" because it uses the sse divsd instruction to perform the division, gcc gets it "wrong" because it uses the x87 fdiv instruction.

You can configure gcc to use sse math with the -mfpmath=sse instruction. The author states that gcc still gets it "wrong" but he simply is not correct. gcc will use divsd and gets 0.501782303180000055.

gcc defaults to use the x87 fpu instead of sse because not all 32 bit x86 cpus have SSE instructions. It's a safer default. When compiled in 64 bit mode, it uses up to SSE2 instructions, because all x86_64 CPUs have SSE2.

https://godbolt.org/z/3PfKQU


I think `-mno-see -mno-sse2` are of interest here, too.


But then why does python and JavaScript on the same machines get it right?


Because they're compiled to use SSE instructions instead of x87 FPU instructions.

If you are still writing 32 bit x86 C code, it's probably some embedded or legacy CPU, and it's wrong to assume SSE instructions are available. So GCC correctly defaults to using FPU instructions instead of SSE instructions on 32 bit x86.

Javascript and python will basically never be run in embedded environments, and their legacy environments probably aren't that legacy that SSE instructions aren't available. So it's reasonable to default to SSE.


Plus JavaScript is interpreted and JITed, it can detect and decide at runtime whether to use SSE or x87.


There is a lot of Python out there running in embedded environments, some quite old. Python is 30 years old after all.


Because it's now getting exceedingly rare to find a program that was not compiled with at least SSE instructions enabled. So the x87 FPU is simply not used then.


Python IIRC stores intermediate results in memory, it doesn't perform long chains of calculations entirely in FPU registers. This ensures the intermediate results are predictably rounded.

JavaScript JITs blindly use SSE without checking if it is available. Bytecode interpreters behave the same as above.


Unsure: compiled to use SSE instead of x87 for fpmath?


Learning C recently, I ran into confusion around FLT_EVAL_METHOD as my dirty (not recommended) trick for getting machine epsilon didn't work (7.0/3.0 - 4.0/3.0 - 1.0). With strict IEEE 754 floating points it works fine (Python, Julia, Go all work as expected). But gcc, with c11 flags at least, performed the computation using the 80 bit upsampling mentioned in this post. Caused some head-scratching initially.

Edit: Also, curiously, things worked as expected for me using -std=gnu11. This was a fairly old version of GCC though, things might have changed.


The worst of this is that internal 80 bit x87 registers are truncated to 64 bit when written to memory. You have essentially no control over when this happens from the language level. It meant that floating point code was all rather undefined and you got different results on different compilers. As someone who has had to write multi platform numerical code I'm very thankful it's gone from AMD64.


If you're interested in the accuracy of floating point operations, I recommend taking a look at the khronos openCL[1], it's not as clear cut as you would think. I think there are good reasons not to expect floating point ot be bit accurate.

[1]:https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenC...


Clang with default settings seems to get it right:

https://imgur.com/a/d3LBlLK


It may gets it right for this particular test case, but it will get others wrong. It's impossible to say "it gets it right" from a single testcase or three, you have to look at the compiler innards.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 for the gory details; clang works the same as GCC's C++ frontend and the same that GCC <4.5 used to behave for C.


Does it work with GCC on your computer? This might be 32-bit specific.


DIY rounding function is given here;

In unbiased rounding, .5 rounds to even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4

https://www.gnu.org/software/gawk/manual/html_node/Round-Fun...


So the complaint is that this loses half a bit of precision on average? Is that a lot?


If you lose half a bit on every operation and do multiple operations, that could be pretty bad. If it's half a bit at the end of a long chain of operations, that's tolerable and not worse than the precision specified by the docs for some execution environments (GPU shaders, for example)

It also matters a lot whether the error is predictable and possible to compensate for.


Back in the 90's I was working on some Ada code with some problems that were traced to conversion of strings to floating point at run time differing from conversion at compile time.


I haven't seen anyone mentioning it, but I wonder if -ffloat-store fixes this?


rust (default settings, amd64) gets 0.501782303180000055


Does Rust still uses LLVM? If so, I'd tend to expect a result similar to Clang.


The thread comments didn't dissapoint:

> someone defended gcc

> someone attacked it

> someone who just started learning C or C++ made a comment

> someone mentioned clang did it right

> someone mentioned Rust


and I got downvoted for not contributing meaningfully to the topic! The set is full




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: