Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Commenters here as well as the author seem content with asserting that under C semantics, the code after the second optimization exhibits UB when it dereferences (p+1). Is it really UB?

Here's the snippet for reference:

    char p[1], q[1] = {0};
    uintptr_t ip = (uintptr_t)(p+1);
    uintptr_t iq = (uintptr_t)q;
    if (iq == ip) {
      *(p+1) = 10; // <- This line changed
      print(q[0]);
    }
From the article: "LLVM IR (just like C) does not permit memory accesses through one-past-the-end pointers." I think this is a mental shortcut we make that can lead us astray here. Of course, one-past-the-end pointers can't usually be dereferenced legally. But I think it's legal here because when (iq == ip), (p+1) points to a valid location. I don't see (at least in the C99 standard) why this code would be illegal.

I realize that the code is not meant to represent C but rather LLVM IR, but surely, if this snippet is legal C code, then LLVM can't treat it as if it were UB? So that would mean the third optimization is incorrect here.



> if this snippet is legal C code, then LLVM can't treat it as if it were UB?

It can in principle for the purpose of this example, since this is not the C code that the programmer originally wrote. This just means that if the snippet is legal C code, LLVM needs to do something extra during its translation to make this legal LLVM IR.

But I am also happy to consider a different example, where this is the original C program. Is this legal C code? To my knowledge, basically everyone agrees that the answer is "no, this is UB". That includes both compilers (that will happily "miscompile" this code) and all formal semantics for C that I have seen so far. So I think you are in the minority with your interpretation of the standard. It's a shame that the standard is not precise enough to properly settle this question...

If the standard were amended to explicitly make this code allowed in C, I think C compiler devs would revolt, as all major C compilers have some pretty crucial analysis that are fundamentally relying on this not being legal C code.


I see, thanks for the explanation. Just read your previous blog post that goes into more detail about this. [1] I don't know much about compilers, but I guess Defect Report #260 supports the interpretation that dereferencing q isn't the same as dereferencing (p+1), even inside an if block where p+1 == q: "[Implementations] may also treat pointers based on different origins as distinct even though they are bitwise identical." [2]

[1] https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: