Hacker News new | past | comments | ask | show | jobs | submit login
Why does changing 0.1f to 0 slow down performance by 10x? (stackoverflow.com)
114 points by Arkid on March 11, 2012 | hide | past | favorite | 25 comments



I've been hit by nasty slow-downs with denormal numbers a couple of times. It's much easier to hit these problems with single-precision floating point, where near underflow happens much more quickly than with doubles.

As a folk-theorem of numerics goes though: such problems are often a symptom that you're not doing quite the right thing. Should your numbers really be that small? Often recasting the problem slightly avoids having to turn on low level flags or other hacking.


I was astonished in my computer architecture class as an undergraduate to find out that the C language specification requires that all single-precision floating point numbers be converted to double-precision before any operations are done on them. So working with float's rather than double's in C is not only more vulnerable to underflow and what-not, but it's slower. Since then I've never declared a variable to be a float again.


I've seen slowdowns with NaNs. It's incredibly frustrating.

The NaNs were being used in a sensible way to indicate "don't process this part of the image, it is over ocean not land and the values are nonsense".


Exact same thing 4 weeks ago: http://news.ycombinator.com/item?id=3602388


That code also serves as a good example of how assumptions about a theoretical "optimizing compiler" can lead people into trouble.

Something like:

  y[i] = y[i] + 0; 
"Should" not take more time than:

  y[i] = y[i] + 0.1; 
because "any decent optimizing compiler obviously would NOP the entire statement out".

I seldom see it is the case though where compilers optimize things that coders assume it is obvious the compiler will optimize.


The semantics of floating point arithmetic in C do not allow the first line to be eliminated. It has non-trivial behavior in the case of signaling NaNs or denormals or -0.0.


Adding zero (of either sign) has no effect on denormals in an IEEE-754 system. -0.0 and signaling NaNs are the only values which are effected.


It can if you set the DAZ (denormals are zero) flag on the Pentium. I don't know if the C & IEEE-754 standards require the optimizer to preserve semantics in that case, but I can see why compiler writers would stay clear of such an optimization.


Neither the C standard nor IEEE-754 specify anything about floating-point operations in non-standard modes like DAZ or FZ. (Nor do they require that support for signaling NaNs be implemented). The behavior of -0.0 suffices to block the optimization under strict fp modes, however.


I think you might have missed something in the OP.

I got the impression that the "+ 0.1" was enough to raise the value of the floating point number being operated on, so that it was not denormalized.

So, the "+ 0.1" version was faster because the numbers it generated were not denormalized. It had nothing to do with the special properties of 0 versus 0.1f (a float).

Neither of the two explanations on the SO post made any assumption about the "theoretical optimizing compiler" that you mention, right?


This is a good opportunity for me to re-read http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.ht...


Interesting discussion. Definitely something for the Lua programmers to keep in mind, as the default number implementation type is a C double. I did a quick test on my ancient MacBook, and the translation of the C++ code into Lua showed a slowdown for the denormalized case, although not as dramatic as the original problem (note: I changed the iteration count from 9,000,000 to 900,000 because the original count took way too long).


Wow, I wonder how many mysterious performance problems can be traced to things like this?


A lot fewer than you'd like to think.


Yes, floating point is scary. I always try to stick with integers/fixed points if at all possible. Especially, if there is a loop with addition. That can eat 1 bit of precision per iteration if you're unlucky


I'm curious why no one has a good answer for the comments questioning why the compiler doesn't optimize out the +0/-0 instead of converting it to a floating-point, which triggers this issue.


The conversion is optimized out by gcc-4.6 at all positive optimization levels, the addition is optimized at -O2 and above. But the slowness is caused by the algorithms being different, with one producing denormals while the other stays reasonable.


@Mystical answers in the comment section:

Now that I look at the assembly, not even + 0.0f gets optimized out. If I had to guess, it could be that + 0.0f would have side-effects if y[i] happened to be a signalling NaN or something.


Consider this:

  x += 0.0;
If x = -0.0 (yes, zero has a sign) then adding 0.0 gives 0.0, so it can have an effect and optimizing it away is wrong and generally only allowed with -ffast-math.


What's an example of 0 having a sign being useful?


If you're seriously interested, read Kahan's "Branch Cuts for Complex Elementary Functions (or: Much Ado About Nothing's Sign Bit)".

TLDR: there are some classes of problems for which the sign bit of zero preserves enough important information to get an accurate solution to a problem that would not be possible if you only had an unsigned zero. These happen to turn up in certain types of conformal mappings that are useful for solving certain PDEs.


1/(0-0) = -∞, 1/(0+0) = ∞


Actually, (0-0) evaluates to +0 in the default rounding mode, so both of those expressions will typically return +∞.

1/-0 will give you -∞, of course.


Well, I meant it as in math notation, not in C.

(Strictly speaking, 1/0 in C is kaboom! (undefined behaviour))


It's not the +0/-0 that causes the issue, it's the repeated dividing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: