Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are a few parts of this I still struggle to understand. I don’t get why ptr2int casts are a problem if you never try and cast the integer back to a ptr. It seems like int2ptr is the real issue.

Also it’s said that casts to int are better then transmutes because cast have side effects. But ptr2int casts don’t actually have side effects, they are a No-op on the hardware.



They are problem in the sense that the address of the pointer gets exposed. After you lose track of who has it and who might do what with it, you can't track the aliasing information of the pointer, so you have to suppress some optimizations. But you are correct in the sense that if int2ptr never happens, it's all good.

About side effects: we are not talking about having side effects on hardware level, we are talking about side effects from the compiler's viewpoint. Again, the compiler might track aliasing information for optimizations, and casting has the side effect of "exposing" the pointer.


> I don’t get why ptr2int casts are a problem if you never try and cast the integer back to a ptr.

AFAIK, you do understand. ptr2int casts are totally fine and defined behavior, as long as the program contains no int2ptr casts. Is there a passage from the OP that contradicts this?


From the section "Casts have a side-effect":

> But in this case, the operation in question is (uintptr_t)x, which has no side-effect – right? Wrong. This is exactly the key lesson that this example teaches us: casting a pointer to an integer has a side-effect, and that side-effect has to be preserved even if we don’t care about the result of the cast. ... We have to lose some optimization, as the example shows. However, the crucial difference to the previous section is that only code which casts pointers to integers is affected.

So even if we never even use the result, casting a pointer to an integer is problematic.

But in the explanation he only talks about the problems of int2ptr cast, which I do undestand.


The problem is that, if we assume that integers don’t have provenance, some far distant part of the code could guess the integer and do an int2ptr. If you can prove that nothing in the entire program could possibly do this for the entire lifetime of the original object, then sure, you could remove the ptr2int. But compiler optimizations usually work one function at a time. In some cases it might be feasible to prove this anyway, like if (a) you have a function that doesn’t call any other functions and (b) the object in question is a local variable that will go out of scope at the end of the function, making any further accesses UB regardless. But in most cases it’s not feasible.


Indeed int2ptr is the "evil" operation. If we banned it, we could get rid of all this "exposed" stuff and ptr2int would be fine. However, in order to make int2ptr work, we have to also make ptr2int a bit more complicated. That's what the example shows: removing a ptr2int introduced UB into the program.

Rust now (experimentally) has an `ptr.addr()` operation that is like ptr2int without the "expose" part, i.e., the resulting integer cannot be cast back but still used for other purposes.


I'd assume because it has something to do with the idea is that an optimizing compiler can't completely delete the address entirely anymore if it's being used for "something". For example optimizing it away to a register only variable.


> they are a No-op on the hardware.

That is not guaranteed. The only guarantee is that you can round trip conversions via (u)intptr_t. The integer representation of a converted pointer can be completely different to accommodate hardware like the Symbolics lisp machine.


Are we losing optimization on x86/arm due to mere existence of other hardware (like symbolics or CHERI) that handles things differently?


You don't lose the optimizations because of UB and aliasing rules letting them stay in, but the people who want to make C safer by simply defining all UB would lose you all these optimizations.

ARM already includes a small part of CHERI (pointer signing) and the rest is coming.


I mean you are correct, but why else would a pointer be converted to it if not to cast it back at some point? I guess you can print it for debugging, but most other uses means it will be used as pointer at some point.


Another reason for converting pointers to integers without ever doing the reverse operation is to hash them.


Some hand-vectorized code will do this to compute the number of non-vectorized elements that may exist before and after a suitably aligned region.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: