Hacker News new | past | comments | ask | show | jobs | submit login

FYI regarding highly-tuned code -- An ex ATI/AMD GPU core designer told me that the price you pay for writing optimized code in OpenCL versus the device specific assembler is roughly 3x. Something to keep in mind if you're targeting a large enough system to OpenCL and you find spots that can't be pushed any faster.



Unlike previous versions, OpenCL 2.0 been shown to only be about 30%[1] slower than CUDA and can approach comparable performance given enough optimisation.

Since I am working on code generation of Kernels to perform dynamic tasks, I can't afford to write at the lowest level available. (I'm accelerating Python/Ruby routines though so OpenCL gives a significant bonus without much pain at all.)

[1] http://dl.acm.org/citation.cfm?id=2066955 (Sorry about the paywall, I access through University VPN)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: