The bottleneck is often RAM. This is especially clear when writing performance-o...

foobar2020 on Sept 11, 2015 | parent | context | favorite | on: Computing 10,000x more efficiently (2010) [pdf]

The bottleneck is often RAM. This is especially clear when writing performance-oriented code in CUDA, where the amount of cores (threads) per one shared memory controller is in the order of thousands.