Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a reasonable intuitive interpretation of how it should behave, but according to the spec it's undefined behaviour and compilers have a great degree of freedom in what happens as a result.


More information on this behavior in the link below.

> Note that, apart from contrived examples with deleted null checks, the current rules do not actually help the compiler meaningfully optimize code. A memcpy implementation cannot rely on pointer validity to speculatively read because, even though memcpy(NULL, NULL, 0) is undefined, slices at the end of a buffer are fine. [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break

https://davidben.net/2024/01/15/empty-slices.html


> [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break

‘Only’ on platforms that have memory protection hardware. Even there, the platform can always allocate an overflow page for a process, or have the page fault handler check whether the page fault happened due to a speculative read, and repair things (I think the latter is hugely, hugely, hugely impractical, but the standard cannot rule it out)


Platforms without memory protection hardware also have no problem reading NULL.


My comment is a reply to (part of) a comment that isn’t talking about reading from NULL. That’s what the [And if the end of the buffer] part implies.

Even if it didn’t, I don’t think the standard should assume that “Platforms without memory protection hardware also have no problem reading NULL”

An OS could, for example, have a very simple memory protection feature where the bottom half of the memory address range is reserved for the OS, the top half for user processes, and any read from an address with the high bit clear by code in the top half of the address range traps and makes the OS kill the process doing the read.


Doesn't it take memory protection hardware to trap on a memory read?


As a philosophical matter, by definition that would be memory protection hardware, sure. But the point is that it's at least conceivable that some platforms might have some crude, hardwired memory protection without having a full MMU.


They may also expect writes to address 0.


Not really. MMIO mapped at 0x0 for example.


Yikes! I would love sipping coffee watching the chief architect chew up whoever suggested that. That sounds awful even on a microcontroller.


On s390 the memory at address 0 (low core) has all sorts of important stuff. Of course s390 has paging enabled pretty much always but still...


AVR’s registers are mapped to address 0. So reading and writing NULL is actually modifying r0.


AVR’s r0 is also a totally normal register, unlike most other RISC which typically have r0 == 0.


Thanks for saving me a search, because I was expecting r0 to be hardcoded to zero.

Sometimes hardware is designed with insufficient input from software folks and the result is something asinine like that. That, or some people like watching the world burn.


What does "speculative" mean in this case? I understand it as CPU-level speculative execution a.k.a. branch mis-prediction, but that shouldn't have any real-world effects (or else we'd have segfaults all the time due to executing code that didn't really happen)


Turns out you can have that kind of speculative failure too! https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...


Why didn't they just... define it, back when they wrote it?


When C was conceived, CPU architectures and platforms were more varied than what we see today. In order to remain portable and yet performant, some details were left as either implementation defined, or completely undefined (i.e. the responsibility of the programmer). Seems archaic today, but it was necessary when C compilers had to be two-pass and run in mere kilobytes of RAM. Even warnings for risky and undefined behavior is a relatively modern concept (last 10-20 years) compared to the age of C.


When C was conceived, it was made for a specific DEC CPU, for making an operating system. The idea of a C standard was in the future.

If you wanted to know what (for instance) memcpy actually did, you looked at the source code, or even more likely, the assembler or machine code output. That was "the standard".


I think it's reasonable to assume that GP clearly meant the C standard being conceived, as, obviously, K&R's C implementation of the language was ad hoc rather than exhibiting any prescribed specification.


No, K&R's book was the standard.


First came the language, then a few years later they described it in a book.


> Seems archaic today ... run in mere kilobytes of RAM

There is an entire industry that does pretty much that... today. They might run in flash instead of RAM, but still, a few kilobytes.

Probably there are more embedded devices out there than PCs. PIC, AVR, MSP, ARM, custom archs. There might be one of those right now under your hand, in that thing you use to move the cursor.


> There is an entire industry that does pretty much that... today.

Which industry runs C compilers on embeded devices? Because that is what the part you elipsised out was talking about.


Oh... yes. You are right. My bad.


many do tho. i have targetted c89 and maybe c99 on several embedded devices


They cross compile. No one is compiling code on these machines.


But you're running the compiler on the device rather than cross-compile?


I doubt you're running C compilers on those devices.


From what I understand:

1. Initially, they just wanted to give compiler makers more freedom: both in the sense "do whatever is simplest" and "do something platform-specific which dev wants". 2. Compiler devs found that they can use UB for optimization: e.g. if we assume that a branch with UB is unreachable we can generate more efficient code. 3. Sadly, compiler devs started to exploit every opportunity for optimization, e.g. removing code with a potential segfault.

I.e. people who made a standard thought that compiler would remove no-op call to memcpy, but GCC removes the whole branch which makes the call as it considers the whole branch impossible. Standard makers thought that compiler devs would be more reasonable


> Standard makers thought that compiler devs would be more reasonable

This is a bit of a terrible take? Compiler devs never did anything "unreasonable", they didn't sit down and go "mwahahaha we can exploit the heck out of UB to break everything!!!!"

Rather, repeatedly applying a series of targeted optimizations, each one in isolation being "reasonable", results in an eventual "unreasonable" total transformation. But this is more an emergent property of modern compilers having hundreds of optimization passes.

At the time the standards were created, the idea of compilers applying so many optimization passes was just not conceivable. Compilers struggled to just do basic compilation. The assumption was a near 1:1 mapping between code & assembly, and that just didn't age well at all.


One could argue that "optimizing based on signed overflow" was an unreasonable step to take, since any given platform will have some sane, consistent behavior when the underlying instructions cause an overflow. A developer using signed operations without poring over the standard might have easily expected incorrect values (or maybe a trap if the platform likes to use those), but not big changes in control flow. In my experience, signed overflow is generally the biggest cause of "they're putting UB in my reasonable C code!", followed by the rules against type punning, which are violated every day by ordinary usage of the POSIX socket functions.


> One could argue that "optimizing based on signed overflow" was an unreasonable step to take

That optimization allows using 64-bit registers / offset loads for signed ints which it can't do if it has to overflow, since that overflow must happen at 32-bits. That's not an uncommon thing.


I started to like signed overflow rules, because it is really easy to find problems using sanitizers.

The strict aliasing rules are not violated by typical POSIX socket code as a cast to a different pointer type, i.e. `struct sockaddr` by itself is well-defined behavior. (and POSIX could of course just define something even if ISO C leaves it undefined, but I don't think this is needed here)


> The strict aliasing rules are not violated by typical POSIX socket code as a cast to a different pointer type, i.e. `struct sockaddr` by itself is well-defined behavior.

Basically all usage of sendmsg() and recvmsg() with a static char[N] buffer is UB, is one big example I've run into. Unless you memcpy every value into and out of the buffer, which literally no one does. Also, reading sa_family from the output of accept() (or putting it into a struct sockaddr_storage and reading ss_family) is UB, unless you memcpy it out, which literally no one does.


Using a static char buffer would indeed UB but we just made the change to C2Y that this ok (and in practice it always was). Incorrect use of sockaddr_storage may lead to UB. But again, most socket code I see is actually correct.


> Compiler devs never did anything "unreasonable", they didn't sit down and go "mwahahaha we can exploit the heck out of UB to break everything!!!!"

Many compiler devs are on record gleefully responding to bug reports with statements on the lines of "your code has undefined behaviour according to the standard, we can do what we like with it, if you don't like it write better code". Less so in recent years as they've realised this was a bad idea or at least a bad look, but in the '00s it was a normal part of the culture.


What stops compiler makers from treating UB as platform-specific behavior rather than as something which cannot happen?

"You are not allowed to do this, and thus..." reasoning assumes that programmers are language lawyers, which is unreasonable.


    bool foo(some_struct* bar) {
        if (bar->blah()) {
            return true;
        }
        if (bar == nullptr) {
            return false;
        }
        return true;
    }
Can the compiler eliminate that nullptr comparison in your opinion yes or no? While this example looks stupid, after inlining it's quite plausible to end up with code in this type of a pattern. Dereferencing a nullptr is UB, and typically the "platform-specific" behavior is a crash, so... why should that if statement remain? And then if it can't remain, why should an explicit `_Nonnull` assertion have different behavior than an explicit deref? What if the compiler can also independently prove that some_struct->blah() always evaluates to false, so it eliminates that entire branch - does the `if (bar == nullptr)` still need to remain in that specific case? If so, why? The code was the same in both cases, the compiler just got better at eliminating dead code.


There isn't a "find UB branches" pass that is seeking out this stuff.

Instead what happens is that you have something like a constant folding or value constraint pass that computes a set of possible values that a variable can hold at various program points by applying constraints of various options. Then you have a dead code elimination pass that identifies dead branches. This pass doesn't know why the "dest" variable can't hold the NULL value at the branch. It just knows that it can't, so it kills the branch.

Imagine the following code:

   int x = abs(get_int());
   if (x < 0) {
     // do stuff
   }
Can the compiler eliminate the branch? Of course. All that's happened here is that the constraint propagation feels "reasonable" to you in this case and "unreasonable" to you in the memcpy case.


Why is it allowed to eliminate the branch? In most architectures abs(INT_MIN) returns INT_MIN which is negative


Calling abs(INT_MIN) on twos-complement machine is not allowed by the C standard. The behavior of abs() is undefined if the result would not fit in the return value.


Where does it say that? I thought this was a famous example from formal methods showing why something really simple could be wrong. It would be strange for the standard to say to ignore it. The behavior is also well defined in two’s complement. People just don’t like it.


https://busybox.net/~landley/c99-draft.html#7.20.6.1

"The abs, labs, and llabs functions compute the absolute value of an integer j. If the result cannot be represented, the behavior is undefined. (242)"

242 The absolute value of the most negative number cannot be represented in two's complement.


I didn't believe this so I looked it up, and yup.

Because of 2's complement limitations, abs(INT_MIN) can't actually be represented and it ends up returning INT_MIN.


It's possible that there is an edge case in the output bounds here. I'm just using it as an example.

Replace it with "int x = foo() ? 1 : 2;" if you want.


> value constraint pass that computes a set of possible values that a variable can hold

Surely that value constraint pass must be using reasoning based on UB in order to remove NULL from the set of possible values?

Being able to disable all such reasoning, then comparing the generated code with and without it enabled would be an excellent way to find UB-related bugs.


There are many such constraints, and often ones that you want.

"These two pointers returned from subsequent calls to malloc cannot alias" is a value constraint that relies on UB. You are going to have a bad time if your compiler can't assume this to be true and comparing two compilations with and without this assumption won't be useful to you as a developer.

There are a handful of cases that people do seem to look at and say "this one smells funny to me", even if we cannot articulate some formal reason why it feels okay for the compiler to build logical conclusions from one assumption and not another. Eliminating null checks that are "dead" because they are dominated by some operation that is illegal if performed on null is the most widely expressed example. Eliminating signed integral bounds checks by assuming that arithmetic operations are non-overflowing is another. Some compilers support explicitly disabling some (but not all) optimizations derived from deductions from these assumptions.

But if you generalize this to all UB you probably won't end up with what you actually want.


More reasonable: Emit a warning or error to make the code and human writing it better.

NOT-reasonable: silently 'optimize' a 'gotcha' into behavior the programmer(s) didn't intend.


NOT-reasonable: expecting the compiler to read the programmer's mind.


OK, you want a FORMAL version?

Acceptable UB: Do the exact same type of operation as for defined behavior, even if the result is defined by how the underlying hardware works.

NOT-acceptable UB: Perform some operation OTHER than the same as if it were the valid code path, EXCEPT: Failure to compile or a warning message stating which code has been transformed into what other operation as a result of UB.


I don't understand, if the operation is not defined, what exactly the compiler should do?

If I tell you "open the door", that implies that the door is there. If the door is not there, how would you still open the door?

Concretely, what do you expect this to return:

  #include <cstddef>
  void sink(ptrdiff_t);
  ptrdiff_t source();

  int foo() {    
    int x = 1;
    int y;
    sink(&y-&x);
    *(&y - source()) = 42;
    return x;
  }
assuming that source() returns the parameter passed to sink()?

Incidentally I had to launder the offset through sink/source, because GCC has a must-alias oracle to mitigate miscompiling some UB code, so in a way it already caters to you.


Evaluated step by step...

Offhand, *sink(&y-&x);* the compiler is not _required_ to lay out variables adjacently. So the computation of the pointers fed to sink does not have to be defined and might not be portable.

It would be permissible for the compiler to refuse to compile that ('line blah, op blah' does not conform the the standard's allowed range of behavior).

It would also be permissible to just allow that operation to happen. It's the difference of two pointer sized units being passed. That's the operation the programmer wrote, that's the operation that will happen. Do not verify bounds or alter behavior because the compiler could calculate that the value happens to be PTRMAX-sizeof(int)+1 (it placed X and Y in reverse of how a naive assumption might assume).

The = 42 line might write to any random address in memory. Again, just compile the code to perform the operation. If that happens to write 42 somewhere in the stack frame that leads to the program corrupting / a segfault that's fine. If the compiler says 'wait that's not a known memory location' or 'that's going to write onto the protected stack!' it can ALSO refuse to compile and say why that code is not valid.

I would expect valid results to be a return of: 42, 1 (possibly with a warning message about undefined operations and the affected lines), OR the program does not compile and there is an error message which says what's wrong.


&y-&x doesn't require the variables to adjacent, just to exist in the same linear address space. It doesn't even imply any specific ordering .

> Again, just compile the code to perform the operation. If that happens to write 42 somewhere in the stack frame that leads to the program corrupting / a segfault that's fine. If the compiler says 'wait that's not a known memory location' or 'that's going to write onto the protected stack!

As far as the compiler is concerned, source() could return 0 and the line be perfectly defined, so there is no reason to produce an error. In fact as far as the compiler is concerned 0 is the only valid value that source could return, so that line can only be writing to y. As that variable is a local variable that going out of scope, the compiler omits the store. Or you also believe that dead store elimination is wrong?

> possibly with a warning message about undefined operations and the affected lines

There is no definitely undefined operation in my example; there can be UB depending on the behaviour of externally compiled functions, but that's true of almost any C++ statement.

What most people in the "compiler must warn about UB" camp fail to realize, is that 99.99% of the time the complier has no way of realizing some code is likely to cause UB: From the compiler point of view my example is perfectly standard compliant [1], UB comes only from the behaviour of source and sink that are not analysable by the compiler.

[1] technically to be fully conforming the code should cast the pointers to uintptr_t before doing the subtraction.


I'm not familiar with the stack-like functions mentioned, but that is indeed something it should NOT eliminate.

In fact, the compiler should not eliminate 'dead stores'. That should be a warning (and emit the code) OR an error (do not emit a program).

The compiler should inform the programmer so the PROGRAM can be made correct. Not so it's particular result can be faster.


Charitable interpretation may be: Back then when the contract of this function was standardized, presumably in C89 which is ~35 years ago, CPUs but also C compilers were not as powerful so wasting an extra couple of CPU cycles to check this condition was much more expensive than it is today. Because of that contract, and which can be seen in the example in the below comments, the compiler is also free to eliminate the dead code which also has the effect of shaving off some extra CPU cycles.


Probably because they did not think of this special case when writing the standard, or did not find it important enough to consider complicating the standard text for.

In C89, there's just a general provision for all standard library functions:

> Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow. If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer), the behavior is undefined. [...]

And then there isn't anything on `memcpy` that would explicitly state otherwise. Later versions of the standard explicitly clarified that this requirement applies even to size 0, but at that point it was only a clarification of an existing requirement from the earlier standard.

People like to read a lot more intention into the standard than is reasonable. Lots of it is just historical accident, really.


Back when they wrote it they were trying to accommodate existing compilers, including those who did useful things to help people catch errors in their programs (e.g. making memcpy trap and send a signal if you called it with NULL). The current generation of compilers that use undefined behaviour as an excuse to do horrible things that screw over regular programmers but increase performance on microbenchmarks postdates the standard.


The original C standard was more descriptive than prescriptive. There was probably an implementation where it crashed or misbehaved.


Because the benefit was probably seen as very little, and the cost significant.

When you're writing a compiler for an architecture where every byte counts you don't make it write extra code for little benefit.

Programmers were routinely counting bytes (both in code size and data) when writing Assembly code back then, and I mean that literally. Some of that carried into higher-level languages, and rightly so.


memcpy used to be a rep movsb on 8086 DOS compilers. I don't remember if rep movsb stops if cx=0 on entry, or decrements first and wraps around, copying 64K of data.


The specification does not explicitly say that, but the clear intention is that REP with CX=0 should be no-op (you get exactly that situation when REP gets interrupted during the last iteration, in that case CX is zero and IP points to the REP, not the following instruction).


Rep movsb copies 64K if CX=0 (that's actually very useful), but memcpy could be implemented as two instructions:

    jcxz skip 
    rep movsb
    skip:


I know at least MSVC's memcpy on x86_64 still results in a rep movsb if the cpuid flag that says rep movsb is fast is set, which it should be on all x86 chips from about 2011/2012 and onward ;)


Every time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior, and to allow compilers to use it as an optimization point


> time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior

That's implementation defined (more or less) ie teh compiler can do whatever makes mst sense for its implementation.

Undefined means (more or less) that the compiler can assume the behaviour never happens so can apply transforms without taking it into account.

> to allow compilers to use it as an optimization point

That's the main advantage of undefined behaviour ie if you can ignore the usage, you may be able to apply optimisations that you couldn't if you had to take it into account. In the article, for example, GCC eliminated what it considered dead code for a NULL check of a variable that couldn't be NULL according to the C spec.

That's also probably the most frustrating thing about optimisations based on undefined behaviour ie checks that prevent undefined behaviour are removed because the compiler thinks that the check can't ever succeed because, if it did, there must have been undefined behaviour. But the way the developer was ensuring defined behaviour was with the check!


AFAIK, something having undefined behavior in the spec does not prevent an implementation- (platform-)specific behavior being defined.

As to your point about checks being erased, that generally happens when the checks happen too late (according to the compiler), or in a wrong way. For example, checking that `src` is not NULL _after_ memcpy(sec, dst, 0) is called. Or, checking for overflow by doing `if(x+y<0) ...` when x and y are nonnegative signed ints.


Here it's more that it allows to assume that this is never the case, thus no need to have an additional check in it I assume ?


I mean, they might not have given thought to that particular corner case, they probably wrote something like

> memcpy(void* ptr1, void* ptr2, int n)

Copy n bytes from ptr1 to ptr2. UNDEFINED if ptr1 is NULL or ptr2 is NULL

‐------

It might also have come from a "explicit better than implicit" opinion, as in "it is better to have developers explicitly handle cases where the null pointer is involved


I think it's more a strategy. C was not created to be safe. It's pretty much a tiny wrapper around assembler. Every limitation requires extra cycles, compile time or runtime, both of which were scarce.

Of course, someone needs to check in the layers of abstraction. The user, programmer, compiler, cpu, architecture.. They chose for the programmer, who like to call themselves "engineers" these days.


I disagree with your premise. C was designed to be a high level (for its time) language, abstracted from actual hardware

>It's pretty much a tiny wrapper around assembler

Assebler has zero problem with adding "null + 4" or computing "null-null". C does, because it's not actually a tiny wrapper.


Not high-level.. Portable. Portable layer above assembler/arch.

NULL doesn't exist in assembler, and in C, NULL is only a defined as a macro. It's not something built-in.

C doesn't have any problems adding 4 to NULL nor subtracting NULL from NULL.


> C doesn't have any problems adding 4 to NULL nor subtracting NULL from NULL.

"Having problems" is not a fair description of what's at stake here. The C standard simply says that it doesn't guarantee that such operations give the results that you expect.

Also please note that the article and this whole thread is about the address zero, not about the number zero. If NULL is #defined as 0 in your implementation and you use it in an expression only involving integers, of course no UB is triggered.


  #include <stddef.h>
  int main() {
    if (NULL + 4) {}
    if (NULL - NULL) {}
    return 0;
  }


Not sure what your last remark means wrt everything else.


I feel strongly they should split undefined behavior in behavior that is not defined, and things that the compiler is allowed to assume. The former basically already exists as "implementation defined behavior". The latter should be written out explicitly in the documentation:

> memcpy(dest, src, count)

> Copies count bytes from src to dest. [...] Note this is not a plain function, but a special form that applies the constraints dest != NULL and src != NULL to the surrounding scope. Equivalent to:

    assume(dest != NULL)
    assume(src != NULL)
    actual_memcpy(dest, src, count)
The conflation of both concepts breaks the mental model of many programmers, especially ones who learned C/C++ in the 90s where it was common to write very different code, with all kinds of now illegal things like type punning and checking this != NULL.

I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-assembler" combined with the above `assume` function or some other syntax to let me help the compiler, so that I can write C like in the 90s - close to metal but with less surprizes.


>Note this is not a plain function, but a special form that applies the constraints dest != NULL and src != NULL to the surrounding scope.

Plain functions can apply constraints to the surrounding code:

https://godbolt.org/z/fP58WGz9f


> I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-assembler" combined with the above `assume` function or some other syntax to let me help the compiler, so that I can write C like in the 90s - close to metal but with less surprizes.

The problem, which you may realise with some more introspection is that "surprising" is actually a property of you, not of the compiler, so you're asking for mind-reading and that's not one of the options. You want not to experience surprise.

You can of course still get 1990s compilers and you're welcome to them. I cannot promise you won't still feel surprised despite your compiler nostalgia, but I can pretty much guarantee that the 1990s compiler results in slower and buggier software, so that's nice, remember only to charge 1990s rates for the work.


I get that for the library. But I'm a bit puzzled about the optimizations done by a compiler based on this behavior.

E.g., suppose we patch GCC to preserve any conditional containing the string 'NULL' in it. Would that have a measurable performance impact on Linux/Chromium/Firefox?


Upon which some people may rely...


People will only rely on UB when it is well defined by a particular implementation, either explicitly or because of a long history of past use. E.g. using unions for type punning in gcc, or allowing methods to be called on null pointers in MSVC.

But there's nothing like that here.


Until a compiler version comes out and since it was UB anyway, the compiler sundenly now behaves in a different way.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: