Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, it is more complicated than that.

First of all compilers disagree on many interpretations and consequences of abstract machine rules. Also compilers have bugs.

So a proficient C/C++ programmer does have to learn what compilers actually do in practice and what they guarantee beyond the standard (or how they differ from it).

> C/C++ isn't a language.

It isn't, but it is a family of languages that share a lot of syntax and semantics.



> First of all compilers disagree on many interpretations and consequences of abstract machine rules.

List them. I am not aware of any well defined parts of the C standard where GCC and Clang disagree in implementation. Only in areas where things are too vague (and are effectively either unspecified or undefined), or understandably in areas where they're "implementation defined".

If there are behaviours where a compiler deviates from the standard it is either something you can configure (e.g. -ftrapv or -fwrapv) or it's a bug.

> Also compilers have bugs.

Nothing you do can defend against compiler bugs outside of extensively testing your results. If you determine that a compiler has a bug then the correct course of action is definitely not: "note it down and incorporate the understanding into your future programs"

> So a proficient C/C++ programmer does have to learn what compilers actually do in practice and what they guarantee beyond the standard (or how they differ from it).

There are situations where it's important to know what the compiler is doing. But these situations are limited to performance optimisation, the knowledge gained through these situations should only be applied to the single version of the compiler you observed it in, and you should not use the knowledge to feed back to your understanding of C or the implementation.

It's almost impossible to decipher how modern C compilers work exactly and trying to determine what an implementation does based on the results of compilation is therefore extremely unreliable. If you need to rely on implementation defined behaviour (unavoidable in any real program) then you should be relying solely on documentation, and if the observed behaviour deviates from the documentation then that is, again, a bug bug.

> It isn't, but it is a family of languages that share a lot of syntax and semantics.

I am not a C/C++/C#/ObjectiveC/JavaScript/Java programmer.

C++ and C might share a lot of syntax but that's basically where the similarities end in any modern implementation. People who know C thinking they know enough C to write reliable and conformant C++ and people who know C++ thinking they know enough C++ to write reliable and conformant C are one of the groups of people who produce the most subtle mistakes in these languages.

I think you could get away with these kinds of things in the 80s but that has definitely not been the case for quite a while.


> List them. I am not aware of any well defined parts of the C standard where GCC and Clang disagree in implementation.

Perhaps it's not "well defined" enough for you, but one example I've been stamping out recently is whether compilers will combine subexpressions across expression boundaries. For example, if you have z = x + y; a = b * z; will the compiler optimize across the semicolon to produce an fma? GCC does it aggressively, while Clang broadly will not (though it can happen in the LLVM backend).


This is behavior is mostly just unspecified, at least for C++ (not sure about C).

I'm aware of some efforts to bring deterministic floating point operations into the C++ standard, but AFAIK there are no publicly available papers yet.


P3375R0 is public now [0], with a couple implementations available [1], [2].

Subexpression combining has more general implications that are usually worked around with gratuitous volatile abuse or magical incantations to construct compiler optimization barriers. Floating point is simply the most straightforward example where it leads to an observable change in behavior.

[0] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p33...

[1] https://github.com/sixitbb/sixit-dmath

[2] https://github.com/J-Montgomery/rfloat/


You're very right that this goes above and beyond anything the C standard specifies aside from stating that the end result should be the same as if the expressions were evaluated separately (unless you have -ffast-math enabled which makes GCC non-conformant in this regard).

If the end result of the calculation differ (and remember that implementations may not always use ieee floats) then you can call it a bug in whatever compiler has that difference.


as it was pointed out to me recently, GCC will happily generate MADs even in standard conforming modes (in c++ at least).


I have no idea how C++ defines this part of its standard but from experience it's likely that it's different in some more or less subtle way which might explain why this is okay. But in the realm of C, without -ffast-math, arithmetic operations on floats can be implemented in any way you can imagine (including having them output to a display in a room full of people with abaci and then interpreting the results of a hand-written sheet returned from said room of people) as long as the observable behaviour is as expected of the semantics.

If this transformation as you describe changes the observable behaviour had it not been applied, then that's just a compiler bug.

This usually means that an operation such as:

    double a = x / n;
    double b = y / n;
    double c = z / n;
    printf("%f, %f, %f\n", a, b, c);
Cannot be implemented by a compiler as:

    double tmp = 1 / n;
    double a = x * tmp;
    double b = y * tmp;
    double c = z * tmp;
    printf("%f, %f, %f\n", a, b, c);
Unless in both cases the same exact value is guaranteed to be printed for all a, b, c, and n.

This is why people enable -ffast-math.


No, it's not a compiler bug or even necessarily an unwelcome optimization. It's a more precise answer than the original two expressions would have produced and precision is ultimately implementation defined. The only thing you can really say is that it's not strictly conforming in the standards sense, which is true of all FP.


I read up a bit more on floating point handling in C99 onwards (don't know about C89, I misplaced my copy of the standard) and expressions are allowed to be contracted unless disabled with the FP_CONTRACT pragma. So again, this is entirely within the bounds of what the C standard explicitly allows and as such if you need stronger guarantees about the results of floating point operations you should disable expression contraction with the pragma in which case, (from further reading) assuming __STDC_IEC_559__ is defined, the compiler should strictly conform to the relevant annex.

Anyone who regularly works with floating point in C and expects precision guarantees should therefore read that relevant portion of the standard.


"Strictly conforming" has a specific meaning in the standard, including that all observable outputs of a program should not depend on implementation defined behavior like the precision of floating point computations.


It can be controlled through compiler options like -ffp-contract In my opinion every team finds fp options for their compiler through hard time bug fixing :)

and I am still in shock that many game projects still ship with fast math enabled.


> I am not aware of any well defined parts of the C standard where GCC and Clang disagree in implementation. Only in areas where things are too vague

well, the part of the standard that are vague and/or underspecified is a very large "Here be dragons" territory.

Time-traveling UB, pointer provenance, aliasing of aggregated types, partially overlapping lifetimes. When writing low level codes, it makes sense to know how exactly the compilers implement these rules.

In particular, regarding aliasing, GCC has a very specific conservative definition (stores can always change the underlying type, reads must read the last written type) that doesn't necessarily match what other compilers do.

>> It isn't, but it is a family of languages that share a lot of syntax and semantics. > I am not a C/C++/C#/ObjectiveC/JavaScript/Java programmer.

C#, Java, JS share a bit of syntax, but certainly not semantics. ObjectiveC/C++ definitely belong. There is a trivial mapping from most C++ constructs to the corresponding C ones.


> well, the part of the standard that are vague and/or underspecified is a very large "Here be dragons" territory.

Sure, but the answer as I said earlier is: don't touch those parts of C.

The subset which _is_ well defined is still perfectly powerful enough to write highly performant software.

It's not like I'm advocating for you to use the brainfuck subset of C.

> When writing low level codes, it makes sense to know how exactly the compilers implement these rules.

Almost nobody is writing C low level enough for this and I've written embedded code which didn't need to worry about strict aliasing.

This is again just a misconception, almost no real programs need to delve this deeply into the details.

> In particular, regarding aliasing, GCC has a very specific conservative definition (stores can always change the underlying type, reads must read the last written type) that doesn't necessarily match what other compilers do.

It doesn't matter what other compilers do as long as in terms of the abstract machine these differences do not break the rules set out in the standard. Again, you do not need to know these details for 99.99% of program code.

> C#, Java, JS share a bit of syntax, but certainly not semantics. ObjectiveC/C++ definitely belong. There is a trivial mapping from most C++ constructs to the corresponding C ones.

There's a mapping from any of these languages to any other one, in some cases also quite trivial, the amount of overlap is immense, but C and C++ have heavily deviated.

I am a C expert, I do not claim to be a C++ expert, every time I look at C++ I am increasingly surprised at just how it redefines something core about C. Something I just learned in this very thread is https://en.cppreference.com/w/cpp/memory/start_lifetime_as which doesn't exist in C because apparently C and C++ define object lifetimes completely differently.

It's dangerous to keep pushing this notion that C and C++ are very similar because it leads to constantly leads to expert C++ programmers confidently writing subtly broken C code and vice versa.


One of the places where C and C++ differ in semantics is in their strict aliasing rules.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: