I tried to do this once with Theano, and found that the latency of the roundtrip...

fulafel · on July 7, 2017

I got curious about the numbers so I did a napkin calculation:

In a 2012 vintage Nvidia article[1] they get 5-6 GB/s in both directions (array size 4MB) which would be around 1500 Mpix/s with 8bit RGBA pixels.

15 Mpix image: Transfers both ways would take 20 ms, and given GPU kernel going at ~5x the CPU speed (CPU 30, GPU 150 Mpix/s), you would spend 100 ms doing the computation. So 120 ms on GPU vs 500 ms on the CPU.

[1] https://devblogs.nvidia.com/parallelforall/how-optimize-data...

edit: so I have no idea about the real GPU spedup, but this shows that the transfers shouldn't hurt too much unless the speedup vs CPU is very small.

nostrademons · on July 7, 2017

Interesting, thanks. So it seems like it'd still be a pretty heavy win for the GPU.

Also, a common use-case on the web today is to have one input image and then a large number of output images (usually smaller) for different screen resolutions & thumbnails. Seems like you could save a lot of time by uploading the input image once and then running a bunch of resize convolutions for different output sizes while it's still in the GPU memory, then download the output files as a batch.