Commenters here as well as the author seem content with asserting that under C semantics, the code after the second optimization exhibits UB when it dereferences (p+1). Is it really UB?
Here's the snippet for reference:
char p[1], q[1] = {0};
uintptr_t ip = (uintptr_t)(p+1);
uintptr_t iq = (uintptr_t)q;
if (iq == ip) {
*(p+1) = 10; // <- This line changed
print(q[0]);
}
From the article: "LLVM IR (just like C) does not permit memory accesses through one-past-the-end pointers." I think this is a mental shortcut we make that can lead us astray here. Of course, one-past-the-end pointers can't usually be dereferenced legally. But I think it's legal here because when (iq == ip), (p+1) points to a valid location. I don't see (at least in the C99 standard) why this code would be illegal.
I realize that the code is not meant to represent C but rather LLVM IR, but surely, if this snippet is legal C code, then LLVM can't treat it as if it were UB? So that would mean the third optimization is incorrect here.
> if this snippet is legal C code, then LLVM can't treat it as if it were UB?
It can in principle for the purpose of this example, since this is not the C code that the programmer originally wrote. This just means that if the snippet is legal C code, LLVM needs to do something extra during its translation to make this legal LLVM IR.
But I am also happy to consider a different example, where this is the original C program. Is this legal C code? To my knowledge, basically everyone agrees that the answer is "no, this is UB". That includes both compilers (that will happily "miscompile" this code) and all formal semantics for C that I have seen so far. So I think you are in the minority with your interpretation of the standard. It's a shame that the standard is not precise enough to properly settle this question...
If the standard were amended to explicitly make this code allowed in C, I think C compiler devs would revolt, as all major C compilers have some pretty crucial analysis that are fundamentally relying on this not being legal C code.
I see, thanks for the explanation. Just read your previous blog post that goes into more detail about this. [1] I don't know much about compilers, but I guess Defect Report #260 supports the interpretation that dereferencing q isn't the same as dereferencing (p+1), even inside an if block where p+1 == q: "[Implementations] may also treat pointers based on different origins as distinct even though they are bitwise identical." [2]
Here's the snippet for reference:
From the article: "LLVM IR (just like C) does not permit memory accesses through one-past-the-end pointers." I think this is a mental shortcut we make that can lead us astray here. Of course, one-past-the-end pointers can't usually be dereferenced legally. But I think it's legal here because when (iq == ip), (p+1) points to a valid location. I don't see (at least in the C99 standard) why this code would be illegal.I realize that the code is not meant to represent C but rather LLVM IR, but surely, if this snippet is legal C code, then LLVM can't treat it as if it were UB? So that would mean the third optimization is incorrect here.