If it were so easy, there would be already specified a subset of C without undefined behavior and you could be able to automatically check your code against it.
My point was only that C programmers should be keenly aware of the pitfalls of undefined behaviour, rather than blithely ignoring it. I've been surprised by the sloppiness of some developers on this point.
> a subset of C without undefined behavior
There are various projects out there that let you produce C code guaranteed to be free of undefined behaviour, but they're not 'quick fix' solutions, so they're not widely used.
I've been actively wondering about generating some efficient and portable C code, and for this project it wouldn't be super-complicated, but undefined behavior is the one thing that keeps me away. C++ and Rust and C# and many other languages all add wonderful things, with side effects on portability, clarity, learning curve, language stability, etc. - wonderful things that I don't always want in a twenty-year-stable system.
Anyway, thank you for these, I'm definitely going to look further here.
What does "keenly aware" even mean? For example: any time I add or subtract two signed ints, undefined behavior can happen. Now what. Must I pepper the code with bounds checks (which are prone to UB too if not done carefully)?
Anyway, any complicated thing that can be easily ignored, inevitably will be.
Keeping the threat of undefined behaviour in mind, and taking steps accordingly, rather than complacently ignoring it. C is a highly unsafe language, and the programmer shouldn't forget this.
> any complicated thing that can be easily ignored, inevitably will be.
The demonstrable inability of C programmers to write correct code is a strong argument against the widespread use of C. Even old languages like Ada show that you can use a language much safer than C and still achieve solid performance. Languages like Rust are making further progress on having safety, performance, and programmer-convenience, all at once.
If you use an ultra-safe language like verified SPARK Ada, the language doesn't even allow you to, say, forget to check whether a denominator is zero, or to forget to protect against out-of-bounds array access.
> Must I pepper the code with bounds checks (which are prone to UB too if not done carefully)?
Not necessarily; a tool can help check for undefined behaviour. Static analysers, GCC flags, and tools like Valgrind, can automatically check for out-of-bounds array access, divide-by-zero, or attempting to dereference NULL. [0] Adding your own runtime assertions isn't a crazy idea though, especially for dev builds. If this were the norm in C programming we'd have fewer security vulnerabilities.
C lacks the kind of runtime checks that are 'always on' in languages like Java and C# (out-of-bounds, divide-by-zero, etc). That's not because such checks don't apply to C code, it's because of the minimalist C design philosophy. You have the option to add your own checks, or use tools to do so automatically, but if you develop without any checks anywhere you should expect to have more bugs. Java added them for a reason.
The C++ language has a somewhat different design philosophy, but it's the same reason its std::array class-template has both a runtime-checked at member-function, and an unchecked operator[]. It would be against the design philosophy to force you to pay the runtime overhead for checks, but it gives you the option.
"Design philosophy"...oh please! C was designed for transistor- and memory- scarce microcomputers. Nowadays there is defacto supercomputer in every phone and runtime bounds checks are cheap. Moreover, allowing CPU to know the size of memory chunk pointed to could enable optimization which would make the code actually faster (not even talking about security benefits). But you C programmers insist tooth an nail against that...
> For example checking for signed overflow must be done carefully:
Right, but we're talking about a simple bounds check. There should be no need for any arithmetic, just comparison.
> "Design philosophy"...oh please! C was designed for transistor- and memory- scarce microcomputers.
Right. Hence its design philosophy.
> Nowadays there is defacto supercomputer in every phone and runtime bounds checks are cheap.
Cheap, but perhaps not cheap enough to dismiss entirely. Bounds checking costs a few percent of performance [0], enough to put some people off in some domains such as in the kernel.
It's a pity C makes it difficult to automate just about any kind of check. Checking whether a pointer overruns a buffer that was returned by free, for instance, requires quite a bit of cleverness, as the system has to track the size of the allocated block.
You have to rely on optional compiler features, elaborate static analysis tools (often proprietary and expensive), and dynamic analysis tools like Valgrind. Ada on the other hand enables all sorts of runtime checks by default, but it's easy to switch them all off if you're sure.
> CPU to know the size of memory chunk pointed to could enable optimization which would make the code actually faster (not even talking about security benefits)
What kind of optimisation do you have in mind? Pre-caching?
> But you C programmers insist tooth an nail against that...
'Fat pointers' of this sort have been tried with the C language [1] but I can't see the committee adding them to the standard. Part of C's virtue is that it's extremely slow moving.
I'm not advocating continued widespread use of C though. I hope safe-but-fast languages like Rust do well. We all pay a price for the problems associated with C and, perhaps to a lesser extent, C++. For what it's worth I haven't written serious C or C++ code for a long time.
> What kind of optimisation do you have in mind? Pre-caching?
All kinds of branch prediction. If CPU knew it is iterating over a fat pointer, it could safely internally unroll the loop or even vectorize the operation. There are so many easy tricks based on runtime information used by JIT languages, yet not implemented in silicon (or only speculatively) because of C legacy.
> If CPU knew it is iterating over a fat pointer, it could safely internally unroll the loop or even vectorize the operation.
That should already be possible with the C custom of passing an array's size alongside the array pointer. As I understand it modern CPUs have very sophisticated loop-detection for doing precisely this kind of thing.
Now we're back to square one: C Compiler has to resort to UB to optimize many kinds "for" loops, including basic stuff like extracting the iteration count - examples in today's discussion. It then tries to emit something that CPU speculative prediction unit can successfully pick up. It is so hard for programmer to unequivocally signal the intent like "we're working with this here chunk of memory malloc'd with this size and nothing ever beyond it and let me know if this is violated". I don't understand why this state of affairs is not universally considered sordid but it's handwaved "just be aware of UB and pass the size along and it will be well"...
> C Compiler has to resort to UB to optimize many kinds "for" loops
The compiler is always permitted to assume the absence of UB. This is the case no matter what your C code is doing, and you're never further than one expression away from UB.
> It is so hard for programmer to unequivocally signal the intent like "we're working with this here chunk of memory malloc'd with this size and nothing ever beyond it and let me know if this is violated".
Indeed, and C pretty much stands alone here. In just about every other language, there's some automated means of keeping track of the size of an allocated block. Even C++ has this in std::array. The only exceptions I can think of are assembly, if that counts, and Forth.
> I don't understand why this state of affairs is not universally considered sordid but it's handwaved "just be aware of UB and pass the size along and it will be well"...
You're not alone in thinking this is a particularly reckless design decision in C. The great Walter Bright, designer of the D language and also a HackerNews regular, wrote a short article on this in 2009, called C's Biggest Mistake. [0] He even suggested a fix, of adding the option of fat pointers into C. I don't think the committee is going to adopt it though.
I recall reading somewhere a snarky take on this: A foundational principle of the C programming language is that the programmer is always right, even when they are wrong.