> This option is enabled by default on most targets.
What a footgun.
I understand that, in an effort to compete with other compilers for relevance, GCC pursued performance over safety. Has that era passed? Could GCC choose safer over fast?
Alternatively, has someone compiled a list of flags one might want to enable in latest GCC to avoid such kinds of dangerous optimizations?
Just for the record, that's not the main purpose of -fdelete-null-pointer-checks.
Normally, it only deletes null checks after actual null pointer dereferences. In principle this can't change observable behavior. Null dereferences are guaranteed to trap, so if you don't trap, it means the pointer wasn't null. In other words, unlike most C compiler optimizations, -fdelete-null-pointer-checks should be safe even if you do commit undefined behavior.
This once caused a kerfuffle with the Linux kernel. At the time, x86_64 CPUs allowed the kernel to dereference userspace addresses, and the kernel allowed userspace to map address 0. Therefore, it was possible for userspace to arrange for null pointers to not trap when dereferenced in the kernel. Which meant that the null check optimization could actually change observable behavior. Which introduced a security vulnerability. [1]
Since then, Linux has been compiled with `-fno-delete-null-pointer-checks`, but it's not really necessary: Linux systems have long since enforced that userspace can't map address 0, which means that deleting null pointer checks should be safe in both kernel and userspace. (Newer CPU security features also protect the kernel even if userspace is allowed to map address 0.)
But anyway, I didn't know that -fdelete-null-pointer-checks treated "memcpy with potentially-zero size" as a condition to remove subsequent null pointer checks. That means that the optimization actually isn't safe! Once GCC is updated to respect the newly well-defined behavior, though, it should become truly safe. Probably.
The same can't be said for most UB optimizations – most of which can't be turned off.
I once spent hours if not days debugging a problem with some code I had recently written because of this exact optimization.
It wasn't an embedded system, but rather an x86 BIOS boot loader, which is sort of halfway there. Protected mode enabled without paging, so there's nothing to trap a NULL.
Completely by accident I had dereferenced a pointer before doing a NULL check. I think the dereference was just printing some integer, which of course had a perfectly sane-looking value so I didn't even think about it.
The compiler, I can't remember if it was gcc or clang by this point, decided that since I had already successfully dereferenced the pointer it could just elide the null check and the code path associated with it.
Finally I ran it in VMware and attached a debugger, which skipped right over the null check even though I could see in the debugger the value was null. So then I went to look at the assembly the compiler generated, and that's when I started to understand what had happened.
It was a head-slapper when I found the dereference above. I added a second null check or moved that code or some such, and that was it.
Now map the hours and days spent into actual money, being taken from project budget, and then you realise why some business prefer some languages over others.
There was a more egregoius one which got Linus further pissed off with GCC, which was due to a 'dereference' that would not trap, but still deleted a later null check (because e.g. int *foo = &bar->baz is basically just calculating an offset to bar, and so will not fail at runtime, but it is still a dereference according to the abstract machine and so is undefined if bar is NULL). I think the risk of something like that is why it's still disabled.
Irrelevant, because delete-null-pointer-checks happens even in absence of nonnull function attribute, see GP's godbolt link, and the documentation that omits any reference to that function attribute.
That is a side effect of passing the pointer as a function parameter marked nonnull. It implies that the pointer is nonnull and any NULL checks against it can be removed. Pass it to a normal function and you will not see the NULL check removed.
Explanation for the above: passing NULL as the destination argument to memcpy() is undefined behaviour at present. gcc assumes that the fact that memcpy() is called therefore means that the destination argument can't be NULL, so "knows" that the dest == NULL check can never be true, and so removes the test and the do_thing1() branch entirely.
Interestingly, replacing len in the memcpy() call results in gcc instead removing the memcpy() call and retaining the check - presumably a different optimisation routine decides that it's a no-op in that case. https://godbolt.org/z/cPdx6v13r is, therefore, interesting - despite this only ever calling test() with a len of 0, the elision of the dest == NULL check is still there, but test() has been inlined without the memcpy (because len == 0) but with do_thing2() (because the behaviour is undefined and so it can assume dest isn't NULL even though there's a NULL literally right there!)
You can, but gcc may replace it with an equivalent set of instructions as a compiler optimization, so you would have no guarantee it is used unless you hack the compiler.
On a related note, GCC optimizing away things is a problem for memset when zeroing buffers containing sensitive data, as GCC can often tell that the buffers are going to be freed and thus the write is deemed unnecessary. That is a security issue and has to be resolved by breaking the compiler’s optimization through a clever trick:
Similarly, GCC may delete a memcpy to a buffer about to be freed, although I have never observed that as you generally don’t do that in production code.
> Similarly, GCC may delete a memcpy to a buffer about to be freed, although I have never observed that as you generally don’t do that in production code.
It's not that crazy. You could have a refcounted object that poisons itself when the refcount drops to zero, but doesn't immediately free itself because many malloc implementations can have bad lock contention on free(). So you poison the object to detect bugs, possibly only in certain configurations, and then queue the pointer for deferred freeing on a single thread at a better time.
(Ok, this doesn't quite do it: poisoning is much more likely to use memset than memcpy, but I assume gcc would optimize out a doomed memset too?)
Yes, it potentially could be optimised out, which is why platforms provide functions like SecureZeroMemory() for cases where you want to be sure that memory is zeroed out.
That would be why I introduced an explicit_memset() into the OpenZFS encryption module in the commit that I linked. It uses two different techniques to guard against the compiler deleting it.
The valid inputs to memcpy() are defined by the C specification, so the compiler is free to make assumptions about what valid inputs are even if the library implementation chooses to allow a broader range of inputs
Per ISO C, the identifiers declared or defined with external linkage by any C standard library header are considered reserved, so the moment you define your own memcpy, you're already in UB land.
Many standard C functions are treated as “magic” by compilers. Malloc is treated as if it has no side effects (which of course it does, it changes allocator state) so the optimiser can elide allocations. If not you wouldn’t be able to elide the call because malloc looks like it has side effects, which it does but not ones we care about observing.
If I'm understanding the OP correctly, the C standard says so, i.e. the semantics of memcpy are defined by the standard and the standard says that it's UB to pass NULL.
Unlike all the more complicated languages the "freestanding" mode C doesn't even have a memcpy feature, so it may not define how one works - maybe you've decided to use the name "memcpy" for your function which generates a memorandum about large South American rodents, and "memo_capybara" was too much typing.
In something like C++ or Rust, even their bare metal "What do you mean Operating System?" modes quietly require memcpy and so on because we're not savages, clearly somebody should provide a way to copy bytes of memory, Rust is so civilised that even on bare metal (in Rust's "core" library) you get a working sort_unstable() for your arbitrary slice types!
Can this function be compiled to store x in a register? Can it be compiled to remove x entirely and return the constant 1? That relies on "knowing that undefined behavior cannot happen." This program will behave differently if we store x on the stack and then return it after we call havoc() than if we call havoc() and then return the constant 1, if havoc() just writes to out of bounds memory addresses or whatever.
In this case the undefined behavior just feels "more extreme" to most people, but it is remarkably hard for people to rigorously define the undefined behavior that should and should not be considered when making optimizations.
Yes it does. The optimizing this to return the constant 1 is not producing an equivalent program unless we make assumptions about the behavioral bounds of havoc().
What is the difference between "writing past the end of an array is UB" and "dereferencing a null pointer is UB" and "passing null as the destination argument to memcpy is UB"? The two programs I listed above are only observationally equivalent if writing past the end of valid allocations is UB.
A core problem with this discussion in almost all circumstances is that people have a vibe for which of these things it feels okay for a compiler to make logical deductions from and which it feels not okay but if you actually sit down and try to formalize this in a way that would be meaningful to compiler vendors, you can't.
This example is not "I know that UB doesn't happen, therefore ...", which is what the memcpy() case is.
It is "I don't care that UB might happen, I am going to act as if it didn't. If the UB then makes the program behave differently than without the UB, that's not my problem".
Which, incidentally, is one of the suggested/permitted responses to UB in the standards text (that was made non-binding).
They're just acting as agents that derive the logical consequences of the code.
The fact that the given example code is "surprising" is analogous to this mathematical derivation:
a = b
a*a = b*a
a*a - b*b = b*a - b*b
(a - b)(a + b) = b(a - b)
(a - b)(a + b)/(a - b) = b(a - b)/(a - b)
^ Divide by 0, undefined behavior!
Everything below is not necessarily true.
a + b = b
b + b = b
2b = b
2 = 1
2 - 1 = 1 - 1
1 = 0
The source of truth about what is/isn't allowed is the C standard, not your personal simplified model of it that may contain dangerous misconceptions. The fact that your mental model doesn't match the document is an education problem, not a problem with the compiler.
> The fact that your mental model doesn't match the document is an education problem, not a problem with the compiler.
Or it is a problem with the document, which is the entire reason we are having this discussion: N3322 argued the document should be fixed, and now it will be for C2y.
I just skimmed through the proposed wording in [N3322]. It looks like it silently fixes a defect too, NULL == NULL was also undefined up until C23. Hilarious.
This is probably related to the issue with NULL - NULL mentioned in the article.
Imagine you’re working in real mode on x86, in the compact or large memory model[1]. This means that a data pointer is basically struct{uint16_t off,seg;} encoding linear address (seg<<4)+off. This makes it annoying to have individual allocations (“objects”) >64K in size (because of the weird carries), so these models don’t allow that. (The huge model does, and it’s significantly slower.) Thus you legitimately have sizeof(size_t) == 2 but sizeof(uintptr_t) == 4 (hi Rust), and God help you if you compare or subtract pointers not within the same allocation. [Also, sizeof(void *) == 4 but sizeof(void (*)(void)) == 2 in the compact model, and the other way around in the medium model.]
Note the addressing scheme is non-bijective. The C standard is generally careful not to require the implementation to canonicalize pointers: if, say, char a[16] happens to be immediately followed by int b[8], an independently declared variable, it may well be that &a+16 (legal “one past” pointer) is {16,1} but &b is {0,2}, which refers to the exact same byte, but the compiler doesn’t have to do anything special because dereferencing &a+16 is UB (duh) and comparing (char *)(&a+16) with (char *)&b or subtracting one from the other is also UB (pointers to different objects).
The issue with NULL == NULL and also with NULL - NULL is that now the null pointer is required to be canonical, or these expressions must canonicalize their operands. I don’t know why you’d ever make an implementation that has non-canonical NULLs, but I guess the text prior to this change allowed such.
> now the null pointer is required to be canonical
Yikes! This particular oddity seems annoying but sort of harmless in x86 real mode, but not necessarily in protected mode. Imagine code that wants to load a pointer into a register: it loads the offset into an ordinary register and the selector portion into a segment register. It’s permissible to load the 0 (null) selector, but loading garbage will fault immediately. So, if you allow non canonical NULL, then knowing that a pointer is either valid or NULL does not allow you to hoist a segment load above a condition that might mean you never actually dereference the pointer.
(I have plenty of experience with low-level OS code in all kinds of nasty x86 modes but, thankfully, not so much experience writing ordinary C code targeting protected mode. It sometimes boggles my mind that anyone ever got decent performance with anything involving far data pointers. Segment loads are slow, and there are not a lot of segment registers to go around.)
In real mode assembly days, ES and sometimes DS were just another base register that you could use in a loop. Given the dearth of addressing modes it was quite nice to assume that large arrays started at xxxx0h and therefore that the offset part of the far pointer was zero.
If so, it's one that's been introduced at some point post C99 -- the C99 spec explicitly defines the behaviour of NULL == NULL. Section 6.5.9 para 6 says "Two pointers compare equal if and only if both are null pointers, both are pointers to the same object [etc etc]".
Cannot find any confirmation to your statement. Otoh "All null pointer values (of compatible typewithin the same address space) are already required to compare equal. " in the limked paper.
"NULL" in fact is a macro, not a part of the language.
null (zero pointer) is, and it is explicitly defined in standard, that comparison of two null pointers lead to equality. You example simply won't compile, it is not undefined; the pointers simply are of different type, period.
here what standard says:
"A pointer to void may be converted to or from a pointer to any object type.
Conversion of a null pointer to another pointer type yields a null pointer of that type. Any two null pointers shall compare equal."
therefore, convert any of them or both to void amd compare.
you'll get equality.
That's a reasonable intuitive interpretation of how it should behave, but according to the spec it's undefined behaviour and compilers have a great degree of freedom in what happens as a result.
More information on this behavior in the link below.
> Note that, apart from contrived examples with deleted null checks, the current rules do not actually help the compiler meaningfully optimize code. A memcpy implementation cannot rely on pointer validity to speculatively read because, even though memcpy(NULL, NULL, 0) is undefined, slices at the end of a buffer are fine. [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break
> [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break
‘Only’ on platforms that have memory protection hardware. Even there, the platform can always allocate an overflow page for a process, or have the page fault handler check whether the page fault happened due to a speculative read, and repair things (I think the latter is hugely, hugely, hugely impractical, but the standard cannot rule it out)
My comment is a reply to (part of) a comment that isn’t talking about reading from NULL. That’s what the [And if the end of the buffer] part implies.
Even if it didn’t, I don’t think the standard should assume that “Platforms without memory protection hardware also have no problem reading NULL”
An OS could, for example, have a very simple memory protection feature where the bottom half of the memory address range is reserved for the OS, the top half for user processes, and any read from an address with the high bit clear by code in the top half of the address range traps and makes the OS kill the process doing the read.
As a philosophical matter, by definition that would be memory protection hardware, sure. But the point is that it's at least conceivable that some platforms might have some crude, hardwired memory protection without having a full MMU.
Thanks for saving me a search, because I was expecting r0 to be hardcoded to zero.
Sometimes hardware is designed with insufficient input from software folks and the result is something asinine like that. That, or some people like watching the world burn.
What does "speculative" mean in this case? I understand it as CPU-level speculative execution a.k.a. branch mis-prediction, but that shouldn't have any real-world effects (or else we'd have segfaults all the time due to executing code that didn't really happen)
When C was conceived, CPU architectures and platforms were more varied than what we see today. In order to remain portable and yet performant, some details were left as either implementation defined, or completely undefined (i.e. the responsibility of the programmer). Seems archaic today, but it was necessary when C compilers had to be two-pass and run in mere kilobytes of RAM. Even warnings for risky and undefined behavior is a relatively modern concept (last 10-20 years) compared to the age of C.
When C was conceived, it was made for a specific DEC CPU, for making an operating system. The idea of a C standard was in the future.
If you wanted to know what (for instance) memcpy actually did, you looked at the source code, or even more likely, the assembler or machine code output. That was "the standard".
I think it's reasonable to assume that GP clearly meant the C standard being conceived, as, obviously, K&R's C implementation of the language was ad hoc rather than exhibiting any prescribed specification.
> Seems archaic today ... run in mere kilobytes of RAM
There is an entire industry that does pretty much that... today. They might run in flash instead of RAM, but still, a few kilobytes.
Probably there are more embedded devices out there than PCs. PIC, AVR, MSP, ARM, custom archs. There might be one of those right now under your hand, in that thing you use to move the cursor.
1. Initially, they just wanted to give compiler makers more freedom: both in the sense "do whatever is simplest" and "do something platform-specific which dev wants".
2. Compiler devs found that they can use UB for optimization: e.g. if we assume that a branch with UB is unreachable we can generate more efficient code.
3. Sadly, compiler devs started to exploit every opportunity for optimization, e.g. removing code with a potential segfault.
I.e. people who made a standard thought that compiler would remove no-op call to memcpy, but GCC removes the whole branch which makes the call as it considers the whole branch impossible. Standard makers thought that compiler devs would be more reasonable
> Standard makers thought that compiler devs would be more reasonable
This is a bit of a terrible take? Compiler devs never did anything "unreasonable", they didn't sit down and go "mwahahaha we can exploit the heck out of UB to break everything!!!!"
Rather, repeatedly applying a series of targeted optimizations, each one in isolation being "reasonable", results in an eventual "unreasonable" total transformation. But this is more an emergent property of modern compilers having hundreds of optimization passes.
At the time the standards were created, the idea of compilers applying so many optimization passes was just not conceivable. Compilers struggled to just do basic compilation. The assumption was a near 1:1 mapping between code & assembly, and that just didn't age well at all.
One could argue that "optimizing based on signed overflow" was an unreasonable step to take, since any given platform will have some sane, consistent behavior when the underlying instructions cause an overflow. A developer using signed operations without poring over the standard might have easily expected incorrect values (or maybe a trap if the platform likes to use those), but not big changes in control flow. In my experience, signed overflow is generally the biggest cause of "they're putting UB in my reasonable C code!", followed by the rules against type punning, which are violated every day by ordinary usage of the POSIX socket functions.
> One could argue that "optimizing based on signed overflow" was an unreasonable step to take
That optimization allows using 64-bit registers / offset loads for signed ints which it can't do if it has to overflow, since that overflow must happen at 32-bits. That's not an uncommon thing.
I started to like signed overflow rules, because it is really easy to find problems using sanitizers.
The strict aliasing rules are not violated by typical POSIX socket code as a cast to a different pointer type, i.e. `struct sockaddr` by itself is well-defined behavior. (and POSIX could of course just define something even if ISO C leaves it undefined, but I don't think this is needed here)
> The strict aliasing rules are not violated by typical POSIX socket code as a cast to a different pointer type, i.e. `struct sockaddr` by itself is well-defined behavior.
Basically all usage of sendmsg() and recvmsg() with a static char[N] buffer is UB, is one big example I've run into. Unless you memcpy every value into and out of the buffer, which literally no one does. Also, reading sa_family from the output of accept() (or putting it into a struct sockaddr_storage and reading ss_family) is UB, unless you memcpy it out, which literally no one does.
Using a static char buffer would indeed UB but we just made the change to C2Y that this ok (and in practice it always was). Incorrect use of sockaddr_storage may lead to UB. But again, most socket code I see is actually correct.
> Compiler devs never did anything "unreasonable", they didn't sit down and go "mwahahaha we can exploit the heck out of UB to break everything!!!!"
Many compiler devs are on record gleefully responding to bug reports with statements on the lines of "your code has undefined behaviour according to the standard, we can do what we like with it, if you don't like it write better code". Less so in recent years as they've realised this was a bad idea or at least a bad look, but in the '00s it was a normal part of the culture.
Can the compiler eliminate that nullptr comparison in your opinion yes or no? While this example looks stupid, after inlining it's quite plausible to end up with code in this type of a pattern. Dereferencing a nullptr is UB, and typically the "platform-specific" behavior is a crash, so... why should that if statement remain? And then if it can't remain, why should an explicit `_Nonnull` assertion have different behavior than an explicit deref? What if the compiler can also independently prove that some_struct->blah() always evaluates to false, so it eliminates that entire branch - does the `if (bar == nullptr)` still need to remain in that specific case? If so, why? The code was the same in both cases, the compiler just got better at eliminating dead code.
There isn't a "find UB branches" pass that is seeking out this stuff.
Instead what happens is that you have something like a constant folding or value constraint pass that computes a set of possible values that a variable can hold at various program points by applying constraints of various options. Then you have a dead code elimination pass that identifies dead branches. This pass doesn't know why the "dest" variable can't hold the NULL value at the branch. It just knows that it can't, so it kills the branch.
Imagine the following code:
int x = abs(get_int());
if (x < 0) {
// do stuff
}
Can the compiler eliminate the branch? Of course. All that's happened here is that the constraint propagation feels "reasonable" to you in this case and "unreasonable" to you in the memcpy case.
Calling abs(INT_MIN) on twos-complement machine is not allowed by the C standard. The behavior of abs() is undefined if the result would not fit in the return value.
Where does it say that? I thought this was a famous example from formal methods showing why something really simple could be wrong. It would be strange for the standard to say to ignore it. The behavior is also well defined in two’s complement. People just don’t like it.
> value constraint pass that computes a set of possible values that a variable can hold
Surely that value constraint pass must be using reasoning based on UB in order to remove NULL from the set of possible values?
Being able to disable all such reasoning, then comparing the generated code with and without it enabled would be an excellent way to find UB-related bugs.
There are many such constraints, and often ones that you want.
"These two pointers returned from subsequent calls to malloc cannot alias" is a value constraint that relies on UB. You are going to have a bad time if your compiler can't assume this to be true and comparing two compilations with and without this assumption won't be useful to you as a developer.
There are a handful of cases that people do seem to look at and say "this one smells funny to me", even if we cannot articulate some formal reason why it feels okay for the compiler to build logical conclusions from one assumption and not another. Eliminating null checks that are "dead" because they are dominated by some operation that is illegal if performed on null is the most widely expressed example. Eliminating signed integral bounds checks by assuming that arithmetic operations are non-overflowing is another. Some compilers support explicitly disabling some (but not all) optimizations derived from deductions from these assumptions.
But if you generalize this to all UB you probably won't end up with what you actually want.
Acceptable UB: Do the exact same type of operation as for defined behavior, even if the result is defined by how the underlying hardware works.
NOT-acceptable UB: Perform some operation OTHER than the same as if it were the valid code path, EXCEPT: Failure to compile or a warning message stating which code has been transformed into what other operation as a result of UB.
I don't understand, if the operation is not defined, what exactly the compiler should do?
If I tell you "open the door", that implies that the door is there. If the door is not there, how would you still open the door?
Concretely, what do you expect this to return:
#include <cstddef>
void sink(ptrdiff_t);
ptrdiff_t source();
int foo() {
int x = 1;
int y;
sink(&y-&x);
*(&y - source()) = 42;
return x;
}
assuming that source() returns the parameter passed to sink()?
Incidentally I had to launder the offset through sink/source, because GCC has a must-alias oracle to mitigate miscompiling some UB code, so in a way it already caters to you.
Offhand, *sink(&y-&x);* the compiler is not _required_ to lay out variables adjacently. So the computation of the pointers fed to sink does not have to be defined and might not be portable.
It would be permissible for the compiler to refuse to compile that ('line blah, op blah' does not conform the the standard's allowed range of behavior).
It would also be permissible to just allow that operation to happen. It's the difference of two pointer sized units being passed. That's the operation the programmer wrote, that's the operation that will happen. Do not verify bounds or alter behavior because the compiler could calculate that the value happens to be PTRMAX-sizeof(int)+1 (it placed X and Y in reverse of how a naive assumption might assume).
The = 42 line might write to any random address in memory. Again, just compile the code to perform the operation. If that happens to write 42 somewhere in the stack frame that leads to the program corrupting / a segfault that's fine. If the compiler says 'wait that's not a known memory location' or 'that's going to write onto the protected stack!' it can ALSO refuse to compile and say why that code is not valid.
I would expect valid results to be a return of: 42, 1 (possibly with a warning message about undefined operations and the affected lines), OR the program does not compile and there is an error message which says what's wrong.
&y-&x doesn't require the variables to adjacent, just to exist in the same linear address space. It doesn't even imply any specific ordering .
> Again, just compile the code to perform the operation. If that happens to write 42 somewhere in the stack frame that leads to the program corrupting / a segfault that's fine. If the compiler says 'wait that's not a known memory location' or 'that's going to write onto the protected stack!
As far as the compiler is concerned, source() could return 0 and the line be perfectly defined, so there is no reason to produce an error. In fact as far as the compiler is concerned 0 is the only valid value that source could return, so that line can only be writing to y. As that variable is a local variable that going out of scope, the compiler omits the store. Or you also believe that dead store elimination is wrong?
> possibly with a warning message about undefined operations and the affected lines
There is no definitely undefined operation in my example; there can be UB depending on the behaviour of externally compiled functions, but that's true of almost any C++ statement.
What most people in the "compiler must warn about UB" camp fail to realize, is that 99.99% of the time the complier has no way of realizing some code is likely to cause UB: From the compiler point of view my example is perfectly standard compliant [1], UB comes only from the behaviour of source and sink that are not analysable by the compiler.
[1] technically to be fully conforming the code should cast the pointers to uintptr_t before doing the subtraction.
Charitable interpretation may be: Back then when the contract of this function was standardized, presumably in C89 which is ~35 years ago, CPUs but also C compilers were not as powerful so wasting an extra couple of CPU cycles to check this condition was much more expensive than it is today. Because of that contract, and which can be seen in the example in the below comments, the compiler is also free to eliminate the dead code which also has the effect of shaving off some extra CPU cycles.
Probably because they did not think of this special case when writing the standard, or did not find it important enough to consider complicating the standard text for.
In C89, there's just a general provision for all standard library functions:
> Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow. If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer), the behavior is undefined. [...]
And then there isn't anything on `memcpy` that would explicitly state otherwise.
Later versions of the standard explicitly clarified that this requirement applies even to size 0, but at that point it was only a clarification of an existing requirement from the earlier standard.
People like to read a lot more intention into the standard than is reasonable. Lots of it is just historical accident, really.
Back when they wrote it they were trying to accommodate existing compilers, including those who did useful things to help people catch errors in their programs (e.g. making memcpy trap and send a signal if you called it with NULL). The current generation of compilers that use undefined behaviour as an excuse to do horrible things that screw over regular programmers but increase performance on microbenchmarks postdates the standard.
Because the benefit was probably seen as very little, and the cost significant.
When you're writing a compiler for an architecture where every byte counts you don't make it write extra code for little benefit.
Programmers were routinely counting bytes (both in code size and data) when writing Assembly code back then, and I mean that literally. Some of that carried into higher-level languages, and rightly so.
memcpy used to be a rep movsb on 8086 DOS compilers. I don't remember if rep movsb stops if cx=0 on entry, or decrements first and wraps around, copying 64K of data.
The specification does not explicitly say that, but the clear intention is that REP with CX=0 should be no-op (you get exactly that situation when REP gets interrupted during the last iteration, in that case CX is zero and IP points to the REP, not the following instruction).
I know at least MSVC's memcpy on x86_64 still results in a rep movsb if the cpuid flag that says rep movsb is fast is set, which it should be on all x86 chips from about 2011/2012 and onward ;)
Every time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior, and to allow compilers to use it as an optimization point
> time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior
That's implementation defined (more or less) ie teh compiler can do whatever makes mst sense for its implementation.
Undefined means (more or less) that the compiler can assume the behaviour never happens so can apply transforms without taking it into account.
> to allow compilers to use it as an optimization point
That's the main advantage of undefined behaviour ie if you can ignore the usage, you may be able to apply optimisations that you couldn't if you had to take it into account. In the article, for example, GCC eliminated what it considered dead code for a NULL check of a variable that couldn't be NULL according to the C spec.
That's also probably the most frustrating thing about optimisations based on undefined behaviour ie checks that prevent undefined behaviour are removed because the compiler thinks that the check can't ever succeed because, if it did, there must have been undefined behaviour. But the way the developer was ensuring defined behaviour was with the check!
AFAIK, something having undefined behavior in the spec does not prevent an implementation- (platform-)specific behavior being defined.
As to your point about checks being erased, that generally happens when the checks happen too late (according to the compiler), or in a wrong way. For example, checking that `src` is not NULL _after_ memcpy(sec, dst, 0) is called. Or, checking for overflow by doing `if(x+y<0) ...` when x and y are nonnegative signed ints.
I mean, they might not have given thought to that particular corner case, they probably wrote something like
> memcpy(void* ptr1, void* ptr2, int n)
Copy n bytes from ptr1 to ptr2.
UNDEFINED if ptr1 is NULL or ptr2 is NULL
‐------
It might also have come from a "explicit better than implicit" opinion, as in "it is better to have developers explicitly handle cases where the null pointer is involved
I think it's more a strategy. C was not created to be safe. It's pretty much a tiny wrapper around assembler. Every limitation requires extra cycles, compile time or runtime, both of which were scarce.
Of course, someone needs to check in the layers of abstraction. The user, programmer, compiler, cpu, architecture.. They chose for the programmer, who like to call themselves "engineers" these days.
> C doesn't have any problems adding 4 to NULL nor subtracting NULL from NULL.
"Having problems" is not a fair description of what's at stake here. The C standard simply says that it doesn't guarantee that such operations give the results that you expect.
Also please note that the article and this whole thread is about the address zero, not about the number zero. If NULL is #defined as 0 in your implementation and you use it in an expression only involving integers, of course no UB is triggered.
I feel strongly they should split undefined behavior in behavior that is not defined, and things that the compiler is allowed to assume. The former basically already exists as "implementation defined behavior". The latter should be written out explicitly in the documentation:
> memcpy(dest, src, count)
> Copies count bytes from src to dest. [...] Note this is not a plain function, but a special form that
applies the constraints dest != NULL and src != NULL to the surrounding scope. Equivalent to:
The conflation of both concepts breaks the mental model of many programmers, especially ones who learned C/C++ in the 90s where it was common to write very different code, with all kinds of now illegal things like type punning and checking this != NULL.
I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-assembler" combined with the above `assume` function or some other syntax to let me help the compiler, so that I can write C like in the 90s - close to metal but with less surprizes.
> I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-assembler" combined with the above `assume` function or some other syntax to let me help the compiler, so that I can write C like in the 90s - close to metal but with less surprizes.
The problem, which you may realise with some more introspection is that "surprising" is actually a property of you, not of the compiler, so you're asking for mind-reading and that's not one of the options. You want not to experience surprise.
You can of course still get 1990s compilers and you're welcome to them. I cannot promise you won't still feel surprised despite your compiler nostalgia, but I can pretty much guarantee that the 1990s compiler results in slower and buggier software, so that's nice, remember only to charge 1990s rates for the work.
I get that for the library. But I'm a bit puzzled about the optimizations done by a compiler based on this behavior.
E.g., suppose we patch GCC to preserve any conditional containing the string 'NULL' in it. Would that have a measurable performance impact on Linux/Chromium/Firefox?
People will only rely on UB when it is well defined by a particular implementation, either explicitly or because of a long history of past use. E.g. using unions for type punning in gcc, or allowing methods to be called on null pointers in MSVC.
A trivial implementation wouldn't dereference dest or src in case the length is 0. That's how a student would write it with a for loop (byte-by-byte copy). A non-trivial implementation might do something with the pointers before entering the copy loop.
It does nothing, but is only defined when the pointers point into or one past the end of valid objects (live allocations), because that's how the standard defines the C VM, in terms of objects, not a flat byte array.
This is wrong. If you do p=malloc(256), p+256 is valid even though it does not point to a valid address (it might be in an unmapped page; check out ElectricFence). Rust's non-null aligned other pointer is the same, memcpy can't assume it can be dereferenced if the size is zero. The standard text in the linked paper says the same.
also UB according to the spec, but LLVM is free to define it. e.g., clang often converts trivial C++ copy constructors to memcpy, which is UB for self-assignment, but I assume that's fine because the C++ front-end only targets LLVM, and LLVM presumably defines the behaviour to do what you'd expect.
Where I work, it is quite normal to link together C code compiled with GCC and Rust code compiled with LLVM, due to how the build system is set up.
As far as I know that disables LTO, but the build system is so complex, and the C code so large, that nobody bothers switching the C side to Clang/LLVM as well.
I have asked this question in the past and was told that memcpy() is allowed to preemptively read before it has determined it needs to write to make it faster on some CPUs. The presumption is that if you are going to be copying data, there is at least one cache line there already, so reading can start early.
Purely mechanically, yes, but in terms of the definition of the behaviour in the C abstract machine, no, because certain operations on null pointers are undefined, even if the obvious low-level compilation turns into nothing.
If you do this, your C code will run significantly slower than, say, Java, Go, or C#, because the compiler is unable to apply even the most basic optimizations (which it can do still in all those other languages).
So, at that point why even use C at all? Today, C is used where the overhead of a managed language is unacceptable. If you could just eat the performance cost, you'd probably already be using a managed language. There's not much desire for a variant of C with what would be at least a 10x slowdown in many workloads.
Or it could be made faster because certain manual optimizations become possible.
An example would a table of interned strings that you wanna match against (say you're writing a parser).
Since standard C says thou shall not compare pointers with < or > unless they both point into the same 'object' you are forbidden from doing the speed of light code:
To elaborate, we treat pointers as more than just integers because it gives optimizers the latitude to reorder and eliminate pointer operations. In the example above we cannot do this, because we cannot prove at compile time that x doesn't live at the address returned by oracle.
However, given how low-level a language C++ is, we can actually break this assumption by setting i to y-x. Since &x[i] is the same as x+i, this means we are actually writing 23 to &y[0].
But that is undefined, you can't do x + (y - x) ie a pointer arithmetic that ends outside of bounds of an array. Since it is undefined, shouldn't C++ assume that changing x[..] can't change y[0]
edit: welp, if I read a few more lines into article I would see that it also tells it is undefined
to be clear, in my example the result of oracle() cannot possibly alias with 'x' in C or C++ (and in fact gcc will optimize accordingly). In a different language where addresses are mere integers, things would be more complicated.
The numerical value returned by oracle might physically match the address of the stack slot for 'x', assuming that it exists, but it doesn't mean that, from a language point of view, it is a valid pointer.
If forging pointers had defined behaviour, it would be impossible to use the language sanely or perform any kind of optimization.
That’s the point. C allows this function to be optimized to always return 1. A “pointers are addresses, just emit reads and writes and stop trying to be so clever” version of C would require x to be spilled to the stack, then the write, then reload x and return whatever it contained.
Then use the register keyword or just reword the standard to assume the register behavior if a variables address hasn't been taken.
The majority of useful optimizations can be kept in a "Sane C" with either code style changes (cache stuff in local vars to avoid aliasing for example) or with minor tweaks to the standard.
Register behavior is what you want essentially all of the time. So we’d just have to write `register` all over the place for no gain.
“Don’t optimize this, read and write it even if you think it’s not necessary” is a very rare case so it shouldn’t be the default. If you want it, use the volatile keyword.
There’s no need to reword the standard to assume the register behavior if the variable’s address hasn’t been taken. That’s already how it works. In this example, if you escape the value of `&x`, it’s not legal to optimize this function to always return 1.
When using C, this can return anything (or crash of oracle function returns an invalid pointer, or rewrite its own code if the code section is writable). So if you get rid of "abstract machine", nothing changes - the program can return anything or crash.
The point is that the C standard does guarantee that the function returns 1 if the program is a valid C program - which means there is no UB.
For example: If the oracle function returns an invalid pointer, then dereferencing that pointer is UB, and therefore the program isn't a valid C program.
the literal 1 is not an object in C or C++ hence it does not have an address. If you meant 'x', then also no, oracle() can't return the address of 'x' because of pointer provenance rules.
That would restrict C to memory models with a linear address space. That is usually the case nowadays for C implementations, but maybe we don’t want to set that in stone, because it would be virtually impossible to revert such a guarantee.
There’s also cases like memory address ranges that map to non-memory hardware (i.e. that don’t behave like “dumb” memory), and how would you have the C standard define behavior for those?
Lastly, CPU caches require some sort of abstract model as soon as you have multi-threading.
The value of an abstract machine is that it allows you to specify how a given program behaves without needing to point to a specific piece of hardware. Compilers then have this as a target when compiling a program for a specific piece of hardware so that they know when the compiler's output is correct.
The issue here is that the abstract machine is under or badly specified.
That’s currently the case in C, in that you can convert pointers to and from uintptr_t. However, not every number representable in that type needs to be valid memory (that’s true on the assembly level as well), hence it’s only defined for valid pointers.
> I think a memory address is a number that CPU considers to be a memory address
I meant to say that, indeed, there must be some concept of CPU for a memory address to have a meaning, and for this concept of CPU to be as widely applicable as possible, surely defining it as abstract as possible is the way to go. Ergo, the idea of a C abstract machine.
Anyway, other people in this thread are discussing the matter more accurately and in more details than I could hope to do, so I'll leave it like that.
Undefined behaviour is undefined behaviour whatever optimisation level you use.
Some -f flags may extend the C standard and remove undefined behaviour in some cases (e.g. strict aliasing, signed integer overflow, writable string constants, etc.)
20 years ago, making a C compiler that provided sane behaviour and better guarantees (going beyond the minimum defined in the standard) to make code safer and programmers' lives easier, even at the cost of some performance, might have been a good idea. Today any programmer who thinks things like not having security bugs are more important than having bigger numbers on microbenchmarks has already moved on from C.
This is certainly not true. Many programmers also learned to the use tools available to write reasonably safe code in C. I do not personally find this problematic.
> Many programmers also learned to the use tools available to write reasonably safe code in C.
And then someone compiled their code with a new compiler and got a security bug. This happens consistently. Every C programmer thinks their code is reasonably safe until someone finds a security bug in it. Many still think so afterwards.
There are couple of cases where compiler optimizations caused security issues, but that this happens all the time is a huge exaggeration. And many of the practically relevant cases can be avoided by using tools such as UBSan. The actual practical issue in C is people getting their pointer arithmetic wrong, which can also be avoided by having safe abstractions for buffer and string handling.
The other fallacy is that these issue then suddenly would disappear when using Rust, which is also not the case. Because the programmer cutting corners in C or prioritizing performance over safety will also use Rust "unsafe" carelessly.
Rust has a clear advantage for temporal memory safety. But it is also possible to have a clear strategy about what data structure owns what other object in C.
> And many of the practically relevant cases can be avoided by using tools such as UBSan.
"can be", but aren't.
> The other fallacy is that these issue then suddenly would disappear when using Rust, which is also not the case. Because the programmer cutting corners in C or prioritizing performance over safety will also use Rust "unsafe" carelessly.
The vast majority of these programmers aren't making a deliberate choice at all though. They pick C because they heard it's fast, they write it in the way that the language nudges them towards, or the way that they see done in libraries and examples, and they end up with unsafe code. Sure, someone can deliberately choose unsafe in Rust, but defaults matter.
> it is also possible to have a clear strategy about what data structure owns what other object in C.
Is it though? How can one distinguish a codebase that does from a codebase that doesn't? Other than the expensive static analysis tool mentioned elsewhere in the thread (at which point you're not really writing "C"), I've never seen a way that worked and was distinguishable from the ways that don't work.
> > And many of the practically relevant cases can be avoided by using tools such as UBSan.
> "can be", but aren't.
It is a possible option when one needs improved safety, and IMHO often
the better option than using Rust.
> > The other fallacy is that these issue then suddenly would disappear when using Rust, which is also not the case. Because the programmer cutting corners in C or prioritizing performance over safety will also use Rust "unsafe" carelessly.
> The vast majority of these programmers aren't making a deliberate choice at all
> though. They pick C because they heard it's fast, they write it in the way that the
> language nudges them towards, or the way that they see done in libraries and examples, > and they end up with unsafe code. Sure, someone can deliberately choose unsafe in
> Rust, but defaults matter.
The choice of handcoding some low-level string manipulation is similar to the choice of using unsafe rust. One can do it or not. There is certainly a better security culture in Rust at this time, but it is unclear to what extend this will be true in the long run. Also C security culture improves too and Rust culture will certainly deteriorate when usage spreads from highly motivated early adopters to the masses.
> > it is also possible to have a clear strategy about what data structure owns what other object in C.
> Is it though? How can one distinguish a codebase that does from a
> codebase that doesn't?
This leads to the argument that it is trivial to see unsafe code in Rust because it is marked "unsafe" and just a small amount of code while in C you would need to look at everything. But this largely a theoretical argument: In practice you need to do some quality control for all code anyway, because memory safety is just
a small piece of overall the puzzle. (and even for memory safety, you also need to look at the code surrounding code in RUst.) In practice, it is not hard to recognize the C code which is dangerous, it is the one where pointer arithmetic and string manipulation is not encapsulated in safe interfaces and it is the code where ownership of pointers is not clear.
>Other than the expensive static analysis tool mentioned elsewhere in the thread (at which point you're not really writing "C"), I've never seen a way that worked and was distinguishable from the ways that don't work.
I see some very high quality C code with barely any memory safety problems. Expensive static analysis can be used when no mistakes are acceptable, but then you should also formally verify the unsafe code in Rust.
> The choice of handcoding some low-level string manipulation is similar to the choice of using unsafe rust. One can do it or not.
But most of the time programmers don't make a conscious choice at all. So opt-out unsafety versus opt-in unsafety is a huge difference.
> In practice you need to do some quality control for all code anyway, because memory safety is just a small piece of overall the puzzle.
Memory safety is literally more than half of real-world security issues.
> In practice, it is not hard to recognize the C code which is dangerous
> I see some very high quality C code with barely any memory safety problems
I hear a lot of C people saying this sort of thing, but they never make it concrete - there's no list of which popular open-source libraries are dangerous and which are not, it's only after a vulnerability is discovered that we hear "oh, that project always had poor quality code". If I pick a random library to maybe use in my project (even big-name ones e.g. libpq or libtiff), no-one can ever actually answer whether that's high quality C code or low quality C code, or give me a simple algorithm that I can actually apply without having to read a load of code and make a subjective judgement. Whereas I don't have to read or judge anything or even properly know rust to do "how much of this rust code is unsafe".
But I think even this is likely overstating it by looking at CVEs and not real world impact.
> > > In practice, it is not hard to recognize the C code which is dangerous
> > I see some very high quality C code with barely any memory safety problems
> I hear a lot of C people saying this sort of thing, but they never make it
> concrete - there's no list of which popular open-source libraries are dangerous
> and which are not, it's only after a vulnerability is discovered that we hear
> "oh, that project always had poor quality code". If I pick a random library
> to maybe use in my project (even big-name ones e.g. libpq or libtiff), no-one
> can ever actually answer whether that's high quality C code or low quality C code
> or give me a simple algorithm that I can actually apply without having to read
> a load of code and make a subjective judgement. Whereas I don't have to read or
> judge anything or even properly know rust to do "how much of this rust code is unsafe".
So you look at all the 300 unmaintained dependencies a typical Rust projects pulls in via cargo and look at all the "unsafe" blocks to screen it? Seriously, the issue is lack of open-source man power and this will hit Rust very hard once the ecosystem gets larger and this goes even more beyond the highly motivated first adopters. I would be more tempted to buy this argument if Rust would have no "unsafe" and I could pull in arbitrary code from anywhere and be safe. And this idea existed before with managed languages... Safe Java in the browser and so. Also sounded plausible but was similarly highly exaggerated as the Rust story.
> A programmer being careless will be careless with Rust "unsafe" too.
Programmers will be careless, sure, but you can't really use unsafe without going out of your way to. Like, no-one is going to write "unsafe { *arr.get_unchecked(index) }" instead of "arr[index]" when they're not thinking about it.
> So you look at all the 300 unmaintained dependencies a typical Rust projects pulls in via cargo and look at all the "unsafe" blocks to screen it?
No, of course not, I run "cargo geiger" and let the computer do it.
I think unmaintained dependencies are less likely, and easier to check, in the Rust world. Ultimately what defines the attack surface is the number of lines of code, not how they're packaged, and C's approach tends to lead to linking in giant do-everything frameworks (e.g. people will link to GLib or APR when they just wanted some string manipulation functions or a hash table, which means you then have to audit the whole framework to audit that program's dependencies. And while the framework might look well-maintained, that doesn't mean that the part your program is using is), reimplementing or copy-pasting common functions because they're not worth adding a dependency for (which is higher risk, and means that well-known bugs can keep reappearing, because there's no central place to fix it once and for all), or both. And C's limited dependency management means that people often resort to vendoring, so even if your dependency is being maintained, those bugfixes may not be making their way into your program.
> And this idea existed before with managed languages... Safe Java in the browser and so. Also sounded plausible but was similarly highly exaggerated as the Rust story.
Java has quietly worked. It didn't succeed in the browser or on the open-source or consumer-facing desktop for reasons that had nothing to do with safety (in some cases they had to do with the perception of safety), but backend processing or corporate internal apps are a lot safer than they used to be, without really having to change much.
You're like a Japanese holdout in the 60s refusing to leave his bunker long after the war is over.
C lost. Memory safety is a huge boon for security. Human beings, even the best of them, cannot consistently write correct C code. (Look at OpenBSD.) You can keep fighting the war your side has already lost or you can move on.
I think the first one, stack overflow, is technically not a memory safety issue, just denial-of-service on resource exhaustion. Stack overflow is well defined as far as I know.
The other three are definitely memory safety issues.
I would consider a stack overflow to be a memory safety issue. The C++ language authors likely would too. C++ famously refused to support variable length stack allocated arrays because of memory safety concerns. In specific, they were worried that code at runtime would make an array so big so big that it would jump the OS guard page, allowing access to unallocated memory that of course is not noticed ahead of time during development. This is probably easy to do unintentionally if you have more stack variables after an enormous stack allocated array and touch them before you touch the array. The alternative is to force developers to use compiler extensions such as alloca(). That makes it easy to pass pointers outside of the stack frame where they are valid and is a definite safety issue. The C++ nitpicking over variable length arrays is silly since it gives us a status quo where C++ developers use alloca() anyway, but it shows that stack overflows are considered a memory safety issue.
In the general case, I think you might be right, although it's a bit mitigated by the fact that Rust does not have support for variable length arrays, alloca, or anything that uses them, in the standard library. As you said though, it's certainly possible.
I was more referring to that specific linked advisory, which is unlikely to use either VLAs or alloca. In that case, where stack overflow would be caused by recursion, a guard frame will always be enough to catch it, and will result in a safe abort [0].
This may be true, but the minimum unsafe code still seems not that small. Maybe I just had bad luck, but one of the first things I looked at more closely was an implementation of a matrix transpose in Rust (as an example of something relevant to my problem domain) and that directly used unsafe Rust to be reasonably fast and then already had a CVE. This was a revealing experience because was just the same type of bug you might have had in similar C code, but in a language where countless people insist that this "can not happen".
I agree that one shouldn't have been included. My favorite ones aren't included here anyway, e.g. how a Rust programmer managed to create a safety issue in a matrix transpose or how the messed up str::repeat in their standard library.
And don't get me wrong. I think Rust is as safer language in C. Just the idea that C is completely unsafe and it is impossible even for experts to write reasonable safe code while it is completely impossible in Rust to create an issue is just a lot of nonsense. In reality, it is possible to screw up in both languages and people do this, and reality is that safety in Rust is only somewhat better when compared to C with good security practices. But this is not how it is presented. I also think the difference will become even smaller when C safety continues to improve as it did in the last years due to better tooling while Rust is being picked up by average programmers under time pressure who will use "unsafe" just as carelessly as they carelessly hand-roll pointer arithmetic in C today.
Note that the key word here is sound. The more common static analyzers are unsound tools that will miss cases. Sound tools do not, but few people know of them, they are rare and they are typically proprietary and expensive.
Sure. I'm also a big fan of what Microsoft has done with SAL. And of course you have formally proven C, as used in seL4. I'd say that the contortions you have to go through to write code with these systems takes you out of the domain of "C" and into a domain of a different, safer language merely resembling C. Such a language might be a fine tool! But it's not arbitrary C.
Note that my original comment above was "reasonably safe" and not "perfectly memory safe". You can formally prove something with a lot of effort, but you can also come reasonably close for practical purposes with a lot less effort and more commonly available tools.
You are right that "arbitrary C" is not safe while safe Rust is safe, but this is mostly begging the question. The question is what can you do with the language depending on your requirements. If you need safe C this doable with enough effort, if you need reasonably safe C this is even practical in most projects, and all this should be compared to Rust as used in a similar situation which very well may include use of unsafe Rust or C libraries which may also limit the safety.
It is C. It is just written with the aid of formal methods. It would be nice if all software were written that way. That said, if you want another language “resembling C”, there is always WUFFS:
> It is C. It is just written with the aid of formal methods.
It is not C in the sense that many of the usual reasons to use C no longer apply. E.g. a common reason to use C is the availability of libraries, but most popular libraries will not pass that analyser so you can't use them if you're depending on that analyser. E.g. a common reason to use C is standard tooling for e.g. automated refactoring, but will those standard tools preserve analyser-passing? Probably not.
As I understand that doesn't imply that it's not undefined to pass NULL pointers. While not what most users expect/want it's possible to this is just a wrapper around an memcpy() which will only be correct to call with valid destination and source pointers even if the length is zero.
No, because ISO never said it must behave this way.
Yes, because every libc I've personally encountered acts this way. At a glance, glibc's x86 implementation[1, 2], musl, and picolibc all handle 0-length memcpy as you'd expect.
I'm sure other folks could dig up the code for Newlib, uclibc, and others, and they'd see the same thing.
On a related note, ISO C has THREE different things that most people tend to lump together as "undefined behavior." They are:
Implementation-defined behavior: ISO doesn't require any particular behavior, but they do require implementations to consistently apply a particular behavior, and document that behavior.
Unspecified behavior: ISO doesn't require any particular behavior, but they do require implementations to consistently use a particular behavior, but they don't require that behavior to be documented.
Undefined behavior: ISO doesn't require any particular behavior, and they don't require implementations to define any particular behavior either.
> However, the most vocal opposition came from a static analysis perspective: Making null pointers well-defined for zero length means that static analyzers can no longer unconditionally report NULL being passed to functions like memcpy—they also need to take the length into account now.
How does this make any sense? We don't want to remove a low hanging footgun because static analyzers can no longer detect it?
No, it means the static analyzers can't report on a different error because a subset of that class of errors is no longer an error, and the static analysis can't usually distinguish between that subset and the rest.
memcpy(NULL, NULL, 0); // Formerly bad, now ok.
memcpy(NULL, NULL, s); // Formerly bad, now unknown (unless it can be proven that s != 0).
and
memcpy(NULL, b, c); // Same issue.
(where "NULL" == "statically known to be NULL", not necessarily just a literal NULL. Not that that changes the difficulty here.)
Previously: warn if either address might be NULL.
Now: warn if either address might be NULL and the length might be nonzero, and prepare for your users to be annoyed and shut this warning off due to the false alarms.
Any useful static analysis tool does a careful balance between false positives and false negatives (aka false alarms and missed bugs). Too many false positives, and that warning will be disabled, or users will get used to ignoring it, or it will be routinely annotated away at call sites without anyone bothering to figure out whether it's valid or not. Soon the tool will cease to be useful and may be entirely abandoned. In actual practice, the sophistication of a static analysis tool is far less relevant than its precision. It's quite common to have an incredibly powerful static analysis tool that is used for only a small handful of blazingly obvious warnings, sometimes ones that the compiler already has implemented! (All the tool's fancy warnings got disabled one by one and nobody noticed.)
Yes, but that tradeoff exists for most things those tools do. If you can easily and perfectly detect an error, it should just go into the compiler (and perhaps language spec).
> If you can easily and perfectly detect an error, it should just go into the compiler (and perhaps language spec).
Nobody seems to care much about removing UB even when it's super easy. For example, a bunch of basic syntax errors like forgetting the closing quote on a string or not having a newline at the end of the file are UB.
Isn't it more sensible to just check that the params that are about to be sent to memcpy be reasonable?
That is why I tend to wrap my system calls with my own internal function (which can be inlined in certain PLs), where I can standardize such tests. Otherwise, the resulting code that performs the checks and does the requisite error handling is bloated.
Note that I am also loath to #DEFINE such code because C is already rife with them and my perspective is that the less of them the better.
At the end of the day, quick and dirty fixes will prove the adage "short cuts make long delays", and OpenBSD's approach is the only really viable long-term solution, where you just have to rewrite your code if it has ill-advised constructs.
For designing libraries such as C's stdlib, I don't believe in 'undefined behavior', clearly define your semantics and say, "If you pass a NULL to memcpy, this is what will happen." Same for providing a (n == 0), or should (src == dst).
And if, for some strange reason, fixing the semantics breaks calling code, then I can't imagine that their code wasn't f_cked in the first place.
As the article points out, all major memcpy implementations already do this check inside memcpy. Sure, the caller can also check, but given that it's both redundant in practice and makes some common patterns harder to use than they would otherwise be, there's no reason to not just standardize what's already happening anyway and make everyone's lives easier in the process.
Why? It's 2024. Make it not be? Sure, some older stuff already written might no longer compile and need to be updated. Put it behind a "newer" standard flag/version or whatever.
Or is it that it can't be caught at compile time and only run time... hmm...
Generally, undefined behavior removes the need for systematically checking for special cases, the most common being out of bounds access.
But it can go further than that. Dereferencing a NULL pointer is undefined behavior, so if a pointer is dereferenced, it can be assumed by the compiler not to be NULL and the code can be optimized. For example:
void foo(int *p) {
*p++;
if (p == NULL) {
printf("val is NULL\n");
} else {
printf("val is %d\n", *p);
}
}
can be optimized to:
void foo(int *p) {
*p++;
printf("val is %d\n", *p);
}
Note that static analyzers will most likely issue a warning here as such a trivial case is most likely a mistake. But the check for NULL may be part of an inline function that is used in many places, and thanks to the undefined behavior, the code that handles the NULL case will only be generated when relevant. The problem, of course, is that it assumes that the programmer knows what he is doing and doesn't make mistakes.
In the case of memcpy(NULL, NULL, 0), there probably isn't much to gain making it undefined. It most likely doesn't help with the memcpy implementation (len=0 is a generally no-op), and inference based on the fact that the arguments can't be NULL is more likely to screw the programmer up than to improve performance.
It all adds up. All those instructions you don't have to execute, especially memory access and cache misses from jumps, pipeline stalls from conditionals, not just from this optimization.
>All of these 'problems' have simple and straigtforward workarounds, I'm not convinced these UB are needed at all.
He gave you a simple and straightforward example, but that example may not be representative of a real world program where complex analysis leads to better performing code.
As a programmer, its far easier to just insert bounds checks everywhere, and trust the system to remove them when possible. This is what Rust does, and it safe. The problem isn't the compiler, the problem is the standard. More broadly, the standard wasn't written with optimizing compilers in mind.
If we're inlining the call, then we can hoist the NULL check out of the loop. Now it's 1 check per 20 million operations. There's no need to eliminate it or have UB at that point.
The simplest example of a compiler optimization enabled by UB would be the following:
int my_function() {
int x = 1;
another_function();
return x;
}
The compiler can optimize that to:
int my_function() {
another_function();
return 1;
}
Because it's UB for another_function() to use an out-of-bounds pointer to access the stack of my_function() and modify the value of x.
And the most important example of a compiler optimization enabled by UB is related to that: being UB to access local variables through out-of-bounds pointers allows the compiler to place them in registers, instead of being forced to go through the stack for every operation.
I don't find those compelling reasons and, to the contrary, I think that kind of semantic circumvention to be a symptom of a poorly developed industry.
How can we have properly functioning programs without clearly-defined, and sensible, semantics?
If the developer needs to use registers, then they should choose a dev env/PL that provides them, otherwise such kludges will crash and burn, IMO.
Are you saying that C compilers should change every local variable access to read and write to the stack just in case some function intentionally does weird pointer arithmetic to change their values without referring to them in the source code?
We stopped explicitly declaring locals with the 'register' keyword circa 40 years ago. Register allocation is a low hanging fruit and one of those things that is definitely best left to a compiler for most code.
And now they have to manage register pressure for it to keep being faster. And false dependencies. And some more. It doesn’t work like that. Developers can’t optimize like compilers do, not with modern CPUs. The compilers do the very heavy lifting in exchange for the complexity of a set of constraints they (and you as a consequence, must) rely on. The more relaxed these constraints are, the less performant code you get. Modern CPUs run modern interpreters as fast as dumbest-compiled C code basically, so if you want sensible semantics, then Typescript is one of the absolutely non-ironic answers.
What you describe there is UB. If you define this in the standard, you are defining a kind of runtime behavior that can never happen in a well formed program and the compiler does not have to make a program that encounters this behavior do anything in particular.
Does this still matters today? I mean, first registers are anyway saved on the stack when calling a function, and caches of modern processors are really nearly as fast (if not as fast!) as a register. Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.
To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.
Assuming that the program is "bug free" to me is a terrible idea, since even mitigations that the programmer puts in place to mitigate the effect of bugs (and no program is bug free) are skipped because the compiler can assume the program has no bug. To me security is more important than a 1% more boost in performance.
Register allocation is one of the most basic optimizations that a compiler can do. Some modern cpus can alias stack memory with internal registers, but it is still not as fast as not spilling at all.
You can enjoy -O0 today and the compiler will happily allocate stack slots for all your variables and keep them up to date (which is useful for debugging). But the difference between -O0 and -O3 is orders of magnitude on many programs.
> I mean, first registers are anyway saved on the stack when calling a function
No, they aren't. For registers defined in the calling convention as "callee-saved", they don't have to be saved on the stack before calling a function (and the called function only has to save them if it actually uses that register). And for registers defined as "caller-saved", they only have to be saved if their value needs to be kept. The compiler knows all that, and tends to use caller-saved registers as scratch space (which doesn't have to be preserved), and callee-saved registers for longer-lived values.
> and caches of modern processors are really nearly as fast (if not as fast!) as a register.
No, they aren't. For instance, a quick web search tells me that the L1D cache for a modern AMD CPU has at least 4 cycles of latency. Which means: even if the value you want to read is already in the L1 cache, the processor has to wait 4 cycles before it has that value.
> Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.
No, they aren't. The register file still exists, even though register renaming means which physical register corresponds to a logical register can change. And there's no VM, most common instructions are decoded directly (without going through microcode) into a single µOp or pair of µOps which is executed directly.
> To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.
It's the opposite: these optimizations are more important nowadays, since memory speeds have not kept up with processor speeds, and power consumption became more relevant.
> To me security is more important than a 1% more boost in performance.
Newer programming languages agree with you, and do things like checking array bounds on every access; they rely on compiler optimizations so that the loss of performance is only that "1%".
Many calling conventions use registers. And no loads and stores are extremely complex and not free at all: fewer can issue in each cycle and there's some very expensive hardware spent to maintain the ordering on execution.
In a real world program removing all UB is some cases impossible without adding new breaking features to the C language. But, taking a real world program and removingh all UB which IS possible to remove will introduce an overhead. In some programs this overhead is irrelevant. In others, it is probably the reason why C was picked.
If you want speed without overhead, you need to have more statically checked guarantees. This is what languages such as Rust attempt to achieve (quite successfully).
What Rust attempts to achieve is the possibility of accidentally introducing UB by designing the language in away that makes it impossible to have UB when sticking to the safe subset.
It also possibly to make sure to ensure that C programs have no UB and this does not require any breaking features to C. It usually requires some refactoring the program.
A bold claim, I've written a whole lot of software in C, and most of it I'd be astonished if it truly has no UB. Even some of the relatively small, carefully written programs probably have edge case UB I never worried about when writing them.
It is certainly true that many C programs have edge cases which trigger UB. I also have written many such programs where I did not care. This does not contradict my statement though. There are programmers who meticulously care (and/or have to care) about getting the edge cases right and this is entirely possible.
I think I worded it poorly. In a real world program, a lot of optimizations rely on assumptions of not triggering UB.
Rephrased:
In a real world program removing all opportunities for UB is in some cases impossible without adding new breaking features to the C language.
This has nothing to do with whether you can or can't write a program without invoking UB. I am talking about a hypothetical large program which does not exhibit undefined behaviour but where if you modified it then you could trigger UB in many ways. The idea I am positing is that to make it such that you could not modify such a program in any way which could trigger UB, would be impossible without adding new breaking features to the C language (e.g. you would need to figure out some way of preventing pointers from being used outside of the lifetime of the object they point to).
But this does not need breaking features, it only needs 1) a opt-in safe mode, and 2) annotations to express additional invariant such as for lifetime. This would not break anything.
It doesn't break existing code, unless you want to statically guarantee that it does not trigger UB, in which case it does. The point is that if you need an opt-in safe mode or annotations to express additional invariants then you can't magically make existing code safe.
A lot of existing code is already safe. You can't prove all (or even most) existing code safe automatically. This is also true for Rust if you do not narrowly define safe as memory safe. You could transform a lot of C code to be memory safe by adding annotations and do some light refactoring and maybe pushing some residual pieces to "unsafe" blocks. This would be very similar to Rust.
Again, I am not trying to argue either way. The point I was making was about how you can't define away all UB in the C standard without needing to modify the language in a breaking way.
> You can't prove all (or even most) existing code safe automatically.
No but rust provides a proper type system which goes a long way to being able to prove and enforce a lot more about program behavior at compile time.
> You could transform a lot of C code to be memory safe by adding annotations and do some light refactoring and maybe pushing some residual pieces to "unsafe" blocks. This would be very similar to Rust.
It would only be somewhat similar to super basic entry level rust which ignores all the opportunities for type checking.
> Again, I am not trying to argue either way. The point I was making was about how you can't define away all UB in the C standard without needing to modify the language in a breaking way.
This depends on how you define "breaking". I think one can add annotations and transform a lot of C code to memory safe C with slight refactoring without introducing changes into the language that would break any existing code. You can not simply switch on a flag make existing code safe ... except you can do this too ... it just then comes with a high run-time cost for checking.
> > > No but rust provides a proper type system which goes a long way to being able to prove and enforce a lot more about program behavior at compile time.
> > You could transform a lot of C code to be memory safe by adding annotations and do some light refactoring and maybe pushing some residual pieces to "unsafe" blocks. This would be very similar to Rust.
> It would only be somewhat similar to super basic entry level rust which ignores all the opportunities for type checking.
I do not believe you can solve a lot more issues with strong typing than you can already solve in C simply by building good abstractions.
> You can not simply switch on a flag make existing code safe ... except you can do this too ... it just then comes with a high run-time cost for checking.
I don't think you can reasonably implement this even at a high runtime cost without breaking programs. Either way, you've managed to re-state the crux of my argument.
> I do not believe you can solve a lot more issues with strong typing than you can already solve in C simply by building good abstractions.
Then I don't think you have much familiarity with strong typing or are underestimating the performance impact of equivalently "safe" (in a broader sense than what rust uses the term for) abstractions in C.
The only way to get equivalent performance while maintaining the same level of guarantees in C is to generate C code, at which point you're definitely better off using another programming language.
https://godbolt.org/z/aPcr1bfPe