Hacker News new | past | comments | ask | show | jobs | submit login

Take a look at, say, Raspberry Pi: 24GFLOP for 1/2W. You won't get this for any number of mobile CPU cores.



Mobile SoCs claimed numbers are hard to take at face value. For one, I'm 98% sure that's FP16 flops. For another, basically all SoCs in shipping devices throttle under load, so efficiency is hard to determine from unrelated peak performance and max power draw numbers.

Anyway, Cortex-A15 is capable of 8 flops per cycle per core which puts it pretty good in theoretical efficiency for its likely power draw at current clocks.


No, these are fair 32bit GFLOPs. No, VC4 do not throttle, power figures are given for the real max load.

And I never managed to get close to 8 ins per cycle on A15, but, for example, an FFT implementation on VC4 is pretty close to a theoretical performance limit. And a fully loaded 4-core A15 will draw far above 500mW anyway.


It's one instruction per cycle that gets 8 flops. And what are you arguing even? Assuming its unthrottled FP32, that gives a quad-core A15 at 2GHz 7 watts to be over 5x less efficient.


I'm arguing that if all you have is 1W, you've got no other option but GPU.


but that's GLES 2.0, which is significantly less flexible than the kinds of GPUs we're discussing here and is not even in the same ballpark as a CPU (and almost certainly significantly less strict in terms of floating point precision than a GLES 3 device).


https://github.com/raspberrypi/userland/blob/master/host_app... is part of the Raspberry Pi GPU FFT example code. That is not GLES 2.0 or even GL of any kind. That's VideoCore QPU assembly language to compile with qasm. I haven't tried writing anything for it, but it certainly looks like it's "the kinds of GPUs we're discussing here" and "in the same ballpark as a CPU".


Yet, it's pretty sufficient for things like FFT.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: