Hacker News new | past | comments | ask | show | jobs | submit login

I tried to do this once with Theano, and found that the latency of the roundtrip to GPU and back made it not worthwhile for a single image. Maybe a batch of images at once would make it worthwhile. And this isn't what theano is intended for, admittedly - custom CUDA might do a better job.



I got curious about the numbers so I did a napkin calculation:

In a 2012 vintage Nvidia article[1] they get 5-6 GB/s in both directions (array size 4MB) which would be around 1500 Mpix/s with 8bit RGBA pixels.

15 Mpix image: Transfers both ways would take 20 ms, and given GPU kernel going at ~5x the CPU speed (CPU 30, GPU 150 Mpix/s), you would spend 100 ms doing the computation. So 120 ms on GPU vs 500 ms on the CPU.

[1] https://devblogs.nvidia.com/parallelforall/how-optimize-data...

edit: so I have no idea about the real GPU spedup, but this shows that the transfers shouldn't hurt too much unless the speedup vs CPU is very small.


Interesting, thanks. So it seems like it'd still be a pretty heavy win for the GPU.

Also, a common use-case on the web today is to have one input image and then a large number of output images (usually smaller) for different screen resolutions & thumbnails. Seems like you could save a lot of time by uploading the input image once and then running a bunch of resize convolutions for different output sizes while it's still in the GPU memory, then download the output files as a batch.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: