Why does changing 0.1f to 0 slow down performance by 10x?

imurray · on March 11, 2012

I've been hit by nasty slow-downs with denormal numbers a couple of times. It's much easier to hit these problems with single-precision floating point, where near underflow happens much more quickly than with doubles.

As a folk-theorem of numerics goes though: such problems are often a symptom that you're not doing quite the right thing. Should your numbers really be that small? Often recasting the problem slightly avoids having to turn on low level flags or other hacking.

ArbitraryLimits · on March 13, 2012

I was astonished in my computer architecture class as an undergraduate to find out that the C language specification requires that all single-precision floating point numbers be converted to double-precision before any operations are done on them. So working with float's rather than double's in C is not only more vulnerable to underflow and what-not, but it's slower. Since then I've never declared a variable to be a float again.

mturmon · on March 12, 2012

I've seen slowdowns with NaNs. It's incredibly frustrating.

The NaNs were being used in a sensible way to indicate "don't process this part of the image, it is over ocean not land and the values are nonsense".

jlarocco · on March 11, 2012

Exact same thing 4 weeks ago: http://news.ycombinator.com/item?id=3602388

droithomme · on March 11, 2012

That code also serves as a good example of how assumptions about a theoretical "optimizing compiler" can lead people into trouble.

Something like:

  y[i] = y[i] + 0;

"Should" not take more time than:

  y[i] = y[i] + 0.1;

because "any decent optimizing compiler obviously would NOP the entire statement out".

I seldom see it is the case though where compilers optimize things that coders assume it is obvious the compiler will optimize.

tlb · on March 11, 2012

The semantics of floating point arithmetic in C do not allow the first line to be eliminated. It has non-trivial behavior in the case of signaling NaNs or denormals or -0.0.

stephencanon · on March 11, 2012

Adding zero (of either sign) has no effect on denormals in an IEEE-754 system. -0.0 and signaling NaNs are the only values which are effected.

tlb · on March 11, 2012

It can if you set the DAZ (denormals are zero) flag on the Pentium. I don't know if the C & IEEE-754 standards require the optimizer to preserve semantics in that case, but I can see why compiler writers would stay clear of such an optimization.

stephencanon · on March 11, 2012

Neither the C standard nor IEEE-754 specify anything about floating-point operations in non-standard modes like DAZ or FZ. (Nor do they require that support for signaling NaNs be implemented). The behavior of -0.0 suffices to block the optimization under strict fp modes, however.

mturmon · on March 12, 2012

I think you might have missed something in the OP.

I got the impression that the "+ 0.1" was enough to raise the value of the floating point number being operated on, so that it was not denormalized.

So, the "+ 0.1" version was faster because the numbers it generated were not denormalized. It had nothing to do with the special properties of 0 versus 0.1f (a float).

Neither of the two explanations on the SO post made any assumption about the "theoretical optimizing compiler" that you mention, right?

malingo · on March 11, 2012

This is a good opportunity for me to re-read http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.ht...

sharkbot · on March 12, 2012

Interesting discussion. Definitely something for the Lua programmers to keep in mind, as the default number implementation type is a C double. I did a quick test on my ancient MacBook, and the translation of the C++ code into Lua showed a slowdown for the denormalized case, although not as dramatic as the original problem (note: I changed the iteration count from 9,000,000 to 900,000 because the original count took way too long).

watmough · on March 11, 2012

Wow, I wonder how many mysterious performance problems can be traced to things like this?

krst · on March 13, 2012

A lot fewer than you'd like to think.

yread · on March 11, 2012

Yes, floating point is scary. I always try to stick with integers/fixed points if at all possible. Especially, if there is a loop with addition. That can eat 1 bit of precision per iteration if you're unlucky

pyre · on March 11, 2012

I'm curious why no one has a good answer for the comments questioning why the compiler doesn't optimize out the +0/-0 instead of converting it to a floating-point, which triggers this issue.

jedbrown · on March 11, 2012

The conversion is optimized out by gcc-4.6 at all positive optimization levels, the addition is optimized at -O2 and above. But the slowness is caused by the algorithms being different, with one producing denormals while the other stays reasonable.

chrisaycock · on March 11, 2012

@Mystical answers in the comment section:

Now that I look at the assembly, not even + 0.0f gets optimized out. If I had to guess, it could be that + 0.0f would have side-effects if y[i] happened to be a signalling NaN or something.

onedognight · on March 11, 2012

Consider this:

  x += 0.0;

If x = -0.0 (yes, zero has a sign) then adding 0.0 gives 0.0, so it can have an effect and optimizing it away is wrong and generally only allowed with -ffast-math.

jimmybot · on March 11, 2012

What's an example of 0 having a sign being useful?

stephencanon · on March 11, 2012

If you're seriously interested, read Kahan's "Branch Cuts for Complex Elementary Functions (or: Much Ado About Nothing's Sign Bit)".

TLDR: there are some classes of problems for which the sign bit of zero preserves enough important information to get an accurate solution to a problem that would not be possible if you only had an unsigned zero. These happen to turn up in certain types of conformal mappings that are useful for solving certain PDEs.

lelf · on March 11, 2012

1/(0-0) = -∞, 1/(0+0) = ∞

stephencanon · on March 11, 2012

Actually, (0-0) evaluates to +0 in the default rounding mode, so both of those expressions will typically return +∞.

1/-0 will give you -∞, of course.

lelf · on March 12, 2012

Well, I meant it as in math notation, not in C.

(Strictly speaking, 1/0 in C is kaboom! (undefined behaviour))

FigBug · on March 11, 2012

It's not the +0/-0 that causes the issue, it's the repeated dividing.