> So the compiler is proving that the array is being overflowed and rather than ...

iainmerrick · on Aug 17, 2023

As noted elsewhere in this thread, GCC by default does the "optimization" and doesn't warn. No doubt there are other examples where Clang is the one that misbehaves.

How are we supposed to know whether our code is being compiled sensibly or not, without poring over the disassembly? Just set all the warning flags and hope for the best?

UncleMeat · on Aug 17, 2023

I think that a big problem is that for every compile that seems "not sensible" and is actually not sensible, there are 100s or 1000s of compiles that would look absolutely insane to a human but are actually exactly what you want when you sit down and think about it for a long time.

Almost all of the "don't do the overly clever stuff!" proposals would throw away a huge amount of actually productive clever stuff.

fluoridation · on Aug 17, 2023

I think what the GP means by "not sensible" is that proving that the code is broken in order to silently optimize it more aggressively is not sensible. If your theorem proven can find a class of bugs then have it emit diagnostics. Don't only use those bugs to make the code run faster. Yes, make the code run faster, but let me know I may be doing something nonsensical, since chances are that it is nonsensical and it doesn't cost anything at run time.

mike_hock · on Aug 17, 2023

A warning is only useful if it prescribes a code transformation that affirms the programmer's intent and silences the warning (unless the warning was a true positive and caught a bug). You cannot simply emit a warning every time you optimize based on UB.

There is no `if(obvious out-of-bound access) silently emit nonsense har har har` in the compiler's source code. The compiler doesn't understand intent or the program as a whole. It applies micro transformations that all make sense in isolation. And while the compiler also tries to detect erroneous programming patterns and warn about those, that's exceedingly more difficult.

fluoridation · on Aug 17, 2023

>You cannot simply emit a warning every time you optimize based on UB.

And I'm not saying it should do that. I'm saying if the compiler is able to detect erroneous code, then it should emit a warning when it does so. An out of bounds access is an example of code that is basically always erroneous.

>There is no `if(obvious out-of-bound access) silently emit nonsense har har har` in the compiler's source code. The compiler doesn't understand intent or the program as a whole. It applies micro transformations that all make sense in isolation.

Yes, I understand that. However, like I said in my first response, this optimization in particular is only valid if the array is definitely accessed incorrectly. If the compiler is able to perform this optimization, there are only two possibilities: either the compiler can determine in some cases (and in this one in particular) that an array is accessed incorrectly and doesn't warn about it; or it can't determine that condition and this optimization is caused by a compiler bug and there are cases where the compiler incorrectly performs it, breaking the code. If the former is the case, then someone wrote the code to check whether an array is always accessed correctly. Either that, or nobody wrote it and the compiler deduces from even more basic principles that arrays must always be accessed by indices less than their lengths; which, I mean, that might be the case, but I seriously doubt it.

tialaramex · on Aug 18, 2023

> if the compiler is able to detect erroneous code

Today in most cases nobody is writing this code. Neither C nor C++ have any mandate for such detection.

There is a proposal, which could perhaps make it into C++ 26, to formally specify "erroneous behaviour" and have the compiler do something particular and warn you that what you're doing is a bad idea for the specified cases†, but it's easily possible that doesn't end up in the IS, or that compiler vendors aren't interested in implementing it. Meanwhile, if it happens at all it's up to the vendor.

† "Erroneous behaviour" is one possible approach to the uninitialized locals problem in C++. Once upon a time C says local variables can be declared and used without initializing them, this actually has Undefined Behaviour, which is very surprising for C and C++ programmers who tend to imagine that they're getting the much milder Unspecified Behaviour, but they are not. Many outfits use compiler flags to say look, when I do this, and I know sometimes I'll screw up, just give me zeroes, so that's Defined Behaviour, it's not Intended Behaviour but at least it's not Undefined. This includes all major OS vendors (Microsoft, Apple, Red Hat etc.)

Some people brought this approach to WG21, but there was pushback, if uninitialized variables are zero, then they're not really uninitialized are they? This has two consequences, 1. Performance optimisations from not initializing data evaporate; and 2. It is now "correct" to use this zero initialization behaviour because it's specified by the language standard, so maybe you can't lint on it.

Erroneous Behaviour solves (2) by saying no, it's still wrong, it's just safely wrong, the compiler can report this is wrong and it must ensure your results are zero.

Another proposal offers a syntax to solve (1) by saying explicitly in your program, "No, I'm a smart C++ programmer, do not initialize these values", akin to the markers like ~~~ you may have seen to mean "Don't initialize this" in some other languages.

muldvarp · on Aug 18, 2023

> However, like I said in my first response, this optimization in particular is only valid if the array is definitely accessed incorrectly.

No. The code does not show any undefined behavior if any of the elements of `table` is equal to `v`, because then the loop is ended by an early return. The compiler certainly did not prove that this code always has undefined behavior.

UncleMeat · on Aug 17, 2023

Right and the next part is the hard part: defining this clearly. What I'm saying is that there is a surprising amount of "wait, actually I do want that" when you dig into this proposal.

fluoridation · on Aug 17, 2023

A reasonable compiler would let you turn off a specific warning for a section of code.

torstenvl · on Aug 18, 2023

They pretty much all do.

#pragma clang diagnostic push

#pragma clang diagnostic ignored "-Wwhatever"

// code

#pragma clang diagnostic pop

fluoridation · on Aug 18, 2023

I was going to comment that GCC doesn't, but it seems it was added as some point since the last time I checked. I know at one time GCC had as a policy not to allow doing that.

torstenvl · on Aug 18, 2023

Here's an example:

https://pastebin.com/raw/fH0Lj2Zb

uecker · on Aug 18, 2023

I agree there should be a warning. But it is not trivial to teach a compiler when to warn or not to not generate too many false positives.

Not as good as warning, but UBSan catches this at run-time: https://godbolt.org/z/Mdjn7h8dj

moefh · on Aug 17, 2023

> whether our code is being compiled sensibly or not

I'm failing to see what's not sensible about how that code is compiled.

The only possible way that function could return false is if you read past the end of the array and the value there happens to be different from `v`. Is it really the more sensible to rely on that, rather than fixing a known behavior in case of array overflow?

robinsonb5 · on Aug 17, 2023

If the compiler's going to interpret undefined behaviour as license to do something that runs counter to the programmer's expectations, the most sensible course of action is for the compiler to yell very loudly about it instead of near-silently producing (differently!) broken code.

Currently that piece of code doesn't trigger a warning with -Wall. It's not even flagged with -Wextra - it needs -Weverything.

moefh · on Aug 17, 2023

One man's "broken code produced by the compiler" is another man's "excellently optimized code by the compiler".

Where to draw the line is not always clear, but here's a very clear-cut example[1] where emitting a warning would be bad. If you don't want to watch the video, it's basically this:

- the code technically contains undefined behavior, but it will never be actually triggered by the program

- changing the code to remove undefined behavior forces the compiler to emit terrible code

Making the compiler yell at the programmer in this case would be terrible, but it's clearly a consequence of what you're asking.

[1] https://youtu.be/yG1OZ69H_-o?t=2358

jeffbee · on Aug 18, 2023

Exactly. I think a lot of this noise is by non-practitioners of the language. The compiler is steel-manning this loop. It is generously interpreting the 4 as irrelevant, and deducing that the loop must always exit early. The author can’t possibly have meant to access beyond the end, because that’s not defined. QED. It seems altogether sensible to me.

Joker_vD · on Aug 18, 2023

Wow, I must congratulate you because this reads equally well both as a serious argument and as a parody of that argument.

So let me reply to your comment as if it were serious: yes, if the programmer by supernatural means knows that the "v" is always presented somewhere in the array, then this function works exactly as intended: it would always return true, and the compiler optimises it to do so as quickly as possible! But... perhaps there is some other way to pass such programmer's knowledge ("the arguments are guaranteed to be such that this loop is guaranteed to finish early") to the compiler in a more explicit way? Some sort of explicitly written assertion? A pre-condition? A contract, if you like?

See, it's very difficuly to maintain such unspoken contracts and invariants during the codebases' life because they're unspoken and unwritten. Comments barely count since compilers generally ignore them.

jeffbee · on Aug 18, 2023

Thanks! I think anyone would have to be nuts to write a loop like this in C++ or tolerate C as a language. C++'s `ranges::find` does what it says, and communicates between the author and the reader as well as the author and the compiler.

robinsonb5 · on Aug 18, 2023

> One man's "broken code produced by the compiler" is another man's "excellently optimized code by the compiler".

To be fair it's not the compiler's fault that the source program is broken - the argument is over whether the compiler is being helpful or being obtuse, and this particular case I'd argue the latter.

Thanks for the video link - it's an interesting example, but the crucial difference there, I think, is that in that case the compiler isn't doing something counter to the programmer's intent. The code isn't incorrect (assuming a non-pathological buffer size) - it's merely more convenient for the compiler when expressed with int32_t indices rather than uint32_t indices.

I do appreciate, though, that deciding what to yell about and what not to yell about is an extremely non-trivial problem.

bondant · on Aug 18, 2023

I am not sure this warning is proving any overflow, you can get the same warning by just accessing table[i].

https://godbolt.org/z/Gxd3rK9Ts