"Manually copy data into a memory buffer" is pretty vague… try "writing a DSP function that does qpel motion compensation without having to calculate and bounds check each source memory access from the start of the image because you're on x86-32 and you only have like six GPRs".
Though that one's for video; images are simpler but you also have to deploy the code to a lot more platforms.
I don't dispute that these optimizations may have been necessary on older hardware, but I think the current generation of Apple CPUs should have plenty of power to not need these micro optimizations (and the hardware video decoder would take care of this anyway).
> Why would an iPhone be running x86-specific code?
The same codebase has to support that (since there's Intel Macs and Intel iOS Simulator), and in this case Apple didn't write the decoder (it's Google's libwebp). I was thinking of an example from ffmpeg in that case.
> and the hardware video decoder would take care of this anyway
…actually, considering that a hardware decoder has to do all the same memory accesses and is written in a combination of C and Verilog, I'm not at all sure it's more secure.
Though that one's for video; images are simpler but you also have to deploy the code to a lot more platforms.