The fact that some subproblems people want to solve are NP complete is mostly ir...

nkurz · on July 24, 2013

remember that people build architectures knowing what compilers can and can't do, and so processors do a lot at runtime

It sounds like you may be referring to MOV elimination by register renaming, which should make the extra moves no more costly than NOP's. I read that Intel post-Ivy Bridge does this, but haven't been able to find any real documentation. Do you know if this is something one can now rely on, or what the limits of this are (number per cycle, size differences, latency)?

nkurz · on July 28, 2013

Answering myself: Yes, this is documented and can be depended on for Ivy Bridge onward.

3.5.1.13 Zero-Latency MOV Instructions

In processors based on Intel microarchitecture code named Ivy Bridge, a subset of register-to-register move operations are executed in the front end (similar to zeroidioms, see Section 3.5.1.8). This conserves scheduling/execution resources in the out-of-order engine. Most forms of register-to-register MOV instructions can benefit from zero-latency MOV. Example 3-23 list the details of those forms that qualify and a small set that do not.

Example 3-23. Zero-Latency MOV Instructions

MOV instructions latency that can be eliminated

  MOV reg32, reg32
  MOV reg64, reg64
  MOVUPD/MOVAPD xmm, xmm
  MOVUPD/MOVAPD ymm, ymm
  MOVUPS?MOVAPS xmm, xmm
  MOVUPS/MOVAPS ymm, ymm
  MOVDQA/MOVDQU xmm, xmm
  MOVDQA/MOVDQU ymm, ymm
  MOVZX reg32, reg8 (if not AH/BH/CH/DH)
  MOVZX reg64, reg8 (if not AH/BH/CH/DH)

MOV instructions latency that cannot be eliminated

  MOV reg8, reg8
  MOV reg16, reg16
  MOVZX reg32, reg8 (if AH/BH/CH/DH)
  MOVZX reg64, reg8 (if AH/BH/CH/DH)
  MOVSX

http://www.intel.com/content/dam/doc/manual/64-ia-32-archite...