>On the one hand, UB can be important for compiler optimizations e.g?

GuB-42 · 2024-12-11T14:11:35 1733926295

Generally, undefined behavior removes the need for systematically checking for special cases, the most common being out of bounds access.

But it can go further than that. Dereferencing a NULL pointer is undefined behavior, so if a pointer is dereferenced, it can be assumed by the compiler not to be NULL and the code can be optimized. For example:

  void foo(int *p) {
    *p++;
    if (p == NULL) {
      printf("val is NULL\n");
    } else {
      printf("val is %d\n", *p);
    }
  }

can be optimized to:

  void foo(int *p) {
    *p++;
    printf("val is %d\n", *p);
  }

Note that static analyzers will most likely issue a warning here as such a trivial case is most likely a mistake. But the check for NULL may be part of an inline function that is used in many places, and thanks to the undefined behavior, the code that handles the NULL case will only be generated when relevant. The problem, of course, is that it assumes that the programmer knows what he is doing and doesn't make mistakes.

In the case of memcpy(NULL, NULL, 0), there probably isn't much to gain making it undefined. It most likely doesn't help with the memcpy implementation (len=0 is a generally no-op), and inference based on the fact that the arguments can't be NULL is more likely to screw the programmer up than to improve performance.

high_na_euv · 2024-12-11T14:14:52 1733926492

But how much actual performance is gained here?

bagels · 2024-12-11T14:29:27 1733927367

It all adds up. All those instructions you don't have to execute, especially memory access and cache misses from jumps, pipeline stalls from conditionals, not just from this optimization.

menaerus · 2024-12-11T14:41:29 1733928089

It depends on your CPU microarchitectural details, on the complexity and size of your binary executable and the workload of your binary.

So there's no universal answer to your question but it could very well be "much".

ncruces · 2024-12-11T14:29:54 1733927394

Imagine that you created a function GetPixel that reads an RGB pixel at a memory address, and which has a NULL check as a precondition.

If the compiler can "prove" that the pointer is not NULL it can (after inlining the call) remove 20 million checks for a 20 megapixel image.

The silly issue is the compiler using "you accessed it before" (aka "undefined behaviour") to "prove" that the pointer is not NULL.

But I can attest that avoiding 20 million such checks does indeed make a huge difference.

cv5005 · 2024-12-11T17:34:55 1733938495

Just make a non null checking version: GetPixelUnsafe() and let the responsibility onto the user to do the null check before the loop.

All of these 'problems' have simple and straigtforward workarounds, I'm not convinced these UB are needed at all.

nemothekid · 2024-12-11T18:27:37 1733941657

>All of these 'problems' have simple and straigtforward workarounds, I'm not convinced these UB are needed at all.

He gave you a simple and straightforward example, but that example may not be representative of a real world program where complex analysis leads to better performing code.

As a programmer, its far easier to just insert bounds checks everywhere, and trust the system to remove them when possible. This is what Rust does, and it safe. The problem isn't the compiler, the problem is the standard. More broadly, the standard wasn't written with optimizing compilers in mind.

ncruces · 2024-12-11T17:45:05 1733939105

That's a non solution for existing code that already calls GetPixel 20 million times.

It's not like I'm saying C is the best possible way to write new code.

I'm just commenting why this matters for performance, and “remove all undefined behavior" from C compilers is a non-starter.

Now go write Rust for all I care.

Dylan16807 · 2024-12-11T22:29:19 1733956159

If we're inlining the call, then we can hoist the NULL check out of the loop. Now it's 1 check per 20 million operations. There's no need to eliminate it or have UB at that point.

cesarb · 2024-12-11T13:52:13 1733925133

The simplest example of a compiler optimization enabled by UB would be the following:

  int my_function() {
    int x = 1;
    another_function();
    return x;
  }

The compiler can optimize that to:

  int my_function() {
    another_function();
    return 1;
  }

Because it's UB for another_function() to use an out-of-bounds pointer to access the stack of my_function() and modify the value of x.

And the most important example of a compiler optimization enabled by UB is related to that: being UB to access local variables through out-of-bounds pointers allows the compiler to place them in registers, instead of being forced to go through the stack for every operation.

MrMcCall · 2024-12-11T14:13:46 1733926426

I don't find those compelling reasons and, to the contrary, I think that kind of semantic circumvention to be a symptom of a poorly developed industry.

How can we have properly functioning programs without clearly-defined, and sensible, semantics?

If the developer needs to use registers, then they should choose a dev env/PL that provides them, otherwise such kludges will crash and burn, IMO.

wat10000 · 2024-12-11T14:54:53 1733928893

Are you saying that C compilers should change every local variable access to read and write to the stack just in case some function intentionally does weird pointer arithmetic to change their values without referring to them in the source code?

gpderetta · 2024-12-11T14:44:27 1733928267

We stopped explicitly declaring locals with the 'register' keyword circa 40 years ago. Register allocation is a low hanging fruit and one of those things that is definitely best left to a compiler for most code.

wruza · 2024-12-11T16:09:36 1733933376

And now they have to manage register pressure for it to keep being faster. And false dependencies. And some more. It doesn’t work like that. Developers can’t optimize like compilers do, not with modern CPUs. The compilers do the very heavy lifting in exchange for the complexity of a set of constraints they (and you as a consequence, must) rely on. The more relaxed these constraints are, the less performant code you get. Modern CPUs run modern interpreters as fast as dumbest-compiled C code basically, so if you want sensible semantics, then Typescript is one of the absolutely non-ironic answers.

bagels · 2024-12-11T14:31:20 1733927480

We pay for the flexibility of not wearing seatbelts for increasing the consequences of crashes.

cv5005 · 2024-12-11T17:42:06 1733938926

You dont need UB for that.

A simple model for both compilers and programmers to understand:

"A variable whose address has not been taken need not be reachable via a random pointer".

I mean that's how an assembly programmer would think - if I put something in r0 I don't expect a store instruction to clobber it.

UncleMeat · 2024-12-11T21:21:55 1733952115

What you describe there is UB. If you define this in the standard, you are defining a kind of runtime behavior that can never happen in a well formed program and the compiler does not have to make a program that encounters this behavior do anything in particular.

alerighi · 2024-12-11T14:10:22 1733926222

Does this still matters today? I mean, first registers are anyway saved on the stack when calling a function, and caches of modern processors are really nearly as fast (if not as fast!) as a register. Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.

To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.

Assuming that the program is "bug free" to me is a terrible idea, since even mitigations that the programmer puts in place to mitigate the effect of bugs (and no program is bug free) are skipped because the compiler can assume the program has no bug. To me security is more important than a 1% more boost in performance.

gpderetta · 2024-12-11T14:41:54 1733928114

Register allocation is one of the most basic optimizations that a compiler can do. Some modern cpus can alias stack memory with internal registers, but it is still not as fast as not spilling at all.

You can enjoy -O0 today and the compiler will happily allocate stack slots for all your variables and keep them up to date (which is useful for debugging). But the difference between -O0 and -O3 is orders of magnitude on many programs.

cesarb · 2024-12-11T17:37:19 1733938639

> I mean, first registers are anyway saved on the stack when calling a function

No, they aren't. For registers defined in the calling convention as "callee-saved", they don't have to be saved on the stack before calling a function (and the called function only has to save them if it actually uses that register). And for registers defined as "caller-saved", they only have to be saved if their value needs to be kept. The compiler knows all that, and tends to use caller-saved registers as scratch space (which doesn't have to be preserved), and callee-saved registers for longer-lived values.

> and caches of modern processors are really nearly as fast (if not as fast!) as a register.

No, they aren't. For instance, a quick web search tells me that the L1D cache for a modern AMD CPU has at least 4 cycles of latency. Which means: even if the value you want to read is already in the L1 cache, the processor has to wait 4 cycles before it has that value.

> Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.

No, they aren't. The register file still exists, even though register renaming means which physical register corresponds to a logical register can change. And there's no VM, most common instructions are decoded directly (without going through microcode) into a single µOp or pair of µOps which is executed directly.

> To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.

It's the opposite: these optimizations are more important nowadays, since memory speeds have not kept up with processor speeds, and power consumption became more relevant.

> To me security is more important than a 1% more boost in performance.

Newer programming languages agree with you, and do things like checking array bounds on every access; they rely on compiler optimizations so that the loss of performance is only that "1%".

wbl · 2024-12-11T16:16:30 1733933790

Many calling conventions use registers. And no loads and stores are extremely complex and not free at all: fewer can issue in each cycle and there's some very expensive hardware spent to maintain the ordering on execution.

rwmj · 2024-12-11T13:53:42 1733925222

This explanation of why signed int overflow is undefined is interesting (although the behaviour is still very annoying): https://kristerw.blogspot.com/2016/02/how-undefined-signed-o... (HN discussion: https://news.ycombinator.com/item?id=11146384)

More examples here: http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

Arch-TK · 2024-12-11T14:01:17 1733925677

http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

In a real world program removing all UB is some cases impossible without adding new breaking features to the C language. But, taking a real world program and removingh all UB which IS possible to remove will introduce an overhead. In some programs this overhead is irrelevant. In others, it is probably the reason why C was picked.

If you want speed without overhead, you need to have more statically checked guarantees. This is what languages such as Rust attempt to achieve (quite successfully).

uecker · 2024-12-11T17:30:05 1733938205

Many real world C programs have no UB.

What Rust attempts to achieve is the possibility of accidentally introducing UB by designing the language in away that makes it impossible to have UB when sticking to the safe subset.

It also possibly to make sure to ensure that C programs have no UB and this does not require any breaking features to C. It usually requires some refactoring the program.

tialaramex · 2024-12-12T00:22:27 1733962947

> Many real world C programs have no UB.

A bold claim, I've written a whole lot of software in C, and most of it I'd be astonished if it truly has no UB. Even some of the relatively small, carefully written programs probably have edge case UB I never worried about when writing them.

uecker · 2024-12-12T02:27:31 1733970451

It is certainly true that many C programs have edge cases which trigger UB. I also have written many such programs where I did not care. This does not contradict my statement though. There are programmers who meticulously care (and/or have to care) about getting the edge cases right and this is entirely possible.

Arch-TK · 2024-12-13T22:45:24 1734129924

I think I worded it poorly. In a real world program, a lot of optimizations rely on assumptions of not triggering UB.

Rephrased:

In a real world program removing all opportunities for UB is in some cases impossible without adding new breaking features to the C language.

This has nothing to do with whether you can or can't write a program without invoking UB. I am talking about a hypothetical large program which does not exhibit undefined behaviour but where if you modified it then you could trigger UB in many ways. The idea I am positing is that to make it such that you could not modify such a program in any way which could trigger UB, would be impossible without adding new breaking features to the C language (e.g. you would need to figure out some way of preventing pointers from being used outside of the lifetime of the object they point to).

uecker · 2024-12-14T20:27:58 1734208078

This is exactly what I am working on.

But this does not need breaking features, it only needs 1) a opt-in safe mode, and 2) annotations to express additional invariant such as for lifetime. This would not break anything.

Arch-TK · 2024-12-18T16:58:03 1734541083

It doesn't break existing code, unless you want to statically guarantee that it does not trigger UB, in which case it does. The point is that if you need an opt-in safe mode or annotations to express additional invariants then you can't magically make existing code safe.

uecker · 2024-12-18T18:14:27 1734545667

A lot of existing code is already safe. You can't prove all (or even most) existing code safe automatically. This is also true for Rust if you do not narrowly define safe as memory safe. You could transform a lot of C code to be memory safe by adding annotations and do some light refactoring and maybe pushing some residual pieces to "unsafe" blocks. This would be very similar to Rust.

Arch-TK · 2024-12-19T22:34:16 1734647656

> A lot of existing code is already safe.

Again, I am not trying to argue either way. The point I was making was about how you can't define away all UB in the C standard without needing to modify the language in a breaking way.

> You can't prove all (or even most) existing code safe automatically.

No but rust provides a proper type system which goes a long way to being able to prove and enforce a lot more about program behavior at compile time.

> You could transform a lot of C code to be memory safe by adding annotations and do some light refactoring and maybe pushing some residual pieces to "unsafe" blocks. This would be very similar to Rust.

It would only be somewhat similar to super basic entry level rust which ignores all the opportunities for type checking.

uecker · 2024-12-20T07:14:15 1734678855

> > A lot of existing code is already safe.

> Again, I am not trying to argue either way. The point I was making was about how you can't define away all UB in the C standard without needing to modify the language in a breaking way.

This depends on how you define "breaking". I think one can add annotations and transform a lot of C code to memory safe C with slight refactoring without introducing changes into the language that would break any existing code. You can not simply switch on a flag make existing code safe ... except you can do this too ... it just then comes with a high run-time cost for checking.

> > > No but rust provides a proper type system which goes a long way to being able to prove and enforce a lot more about program behavior at compile time.

> > You could transform a lot of C code to be memory safe by adding annotations and do some light refactoring and maybe pushing some residual pieces to "unsafe" blocks. This would be very similar to Rust.

> It would only be somewhat similar to super basic entry level rust which ignores all the opportunities for type checking.

I do not believe you can solve a lot more issues with strong typing than you can already solve in C simply by building good abstractions.

Arch-TK · 2024-12-21T00:08:17 1734739697

> You can not simply switch on a flag make existing code safe ... except you can do this too ... it just then comes with a high run-time cost for checking.

I don't think you can reasonably implement this even at a high runtime cost without breaking programs. Either way, you've managed to re-state the crux of my argument.

> I do not believe you can solve a lot more issues with strong typing than you can already solve in C simply by building good abstractions.

Then I don't think you have much familiarity with strong typing or are underestimating the performance impact of equivalently "safe" (in a broader sense than what rust uses the term for) abstractions in C.

The only way to get equivalent performance while maintaining the same level of guarantees in C is to generate C code, at which point you're definitely better off using another programming language.

cwzwarich · 2024-12-11T13:49:52 1733924992

The example in this blurb is a pretty good one: https://www.hboehm.info/c++mm/why_undef.html