I have to support this advice.
Poor man's sleep is not really sleep as it is still executing NOPs. The on-chip sleep mechanism puts various parts of the silicon into an off-state. The target of sleep is to reduce current consumption when the hardware is idle.
Sleep is a generic term. In this context of an arduino the sleep term is perfectly applicable. On a bigger processor with pipelining/memorycontroller/tlbs I'd agree with you.
The delay is just 10ms and this is a bootloader code that runs only once at bootup. As a bootloader for an MCU, a less error-prone way with less code size is more preferrable.
I can agree that justification to use nop loops somewhere along the lines of "core starts executing code when, judging by real-world testing, hardware peripherals do not guarantee stable state. Thorough testing suggests that 8ms+safety margin delay mitigates issues related to hardware readiness. 2500 cycle nop-loop guarantees required 10ms delay on fastest clock speeds and 50ms delay on slowest core speeds without causing observable issues. 50ms is hereby deemed acceptable." would acceptable in most commercial design documents.