Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is always a minimum cost of moving data from one place to another. If you're computing on the GPU, the data must arrive there. The problem is that PCIE bandwidth is often a bottleneck, and so if you can upload compressed data then you essentially get a free multiplier of bandwidth based on the compression ratio. If the decompression time is faster than having sent the full uncompressed dataset, then you win.

But yeah, direct IO to the GPU would be great but that's not feasible right now.



>The problem is that PCIE bandwidth is often a bottleneck, and so if you can upload compressed data then you essentially get a free multiplier of bandwidth based on the compression ratio.

Agreed! The history of computers is sort of like, at any given point of historical time, there's always a bottleneck somewhere...

It's either with the speed of a historical CPU running a specific algorithm, with a historical type of RAM, with a historical storage subsystem, or with a historical type of bus or I/O device... once one is fixed by whatever novel method or upgrade -- then we always invariably run into another bottleneck! <g>

>But yeah, direct IO to the GPU would be great but that's not feasible right now.

Agreed! For consumers, a "direct direct" (for lack of better terminology!) CPU-to-GPU completely dedicated I/O path (as opposed to the use of PCIe as an intermediary) isn't (to the best of my knowledge) generally available at this point in time...

If we are looking towards the future, and/or the super high end business/workstation market, then we might wish to consider checking out Nvidia's Grace (Hopper) CPU architecture: https://www.nvidia.com/en-us/data-center/grace-cpu/

>"The fourth-generation NVIDIA NVLink-C2C delivers 900 gigabytes per second (GB/s) of bidirectional bandwidth between the NVIDIA Grace CPU and NVIDIA GPUs."

Or, we could check out the Cerebras WSE-2:

https://www.cerebras.net/product-chip/

>"Unlike traditional devices, in which the working cache memory is tiny, the WSE-2 takes 40GB of super-fast on-chip SRAM and spreads it evenly across the entire surface of the chip. This gives every core single-clock-cycle access to fast memory at extremely high bandwidth – 20 PB/s. This is 1,000x more capacity and 9,800x greater bandwidth than the leading GPU."

Unfortunately it's (again, to the best of my limited knowledge!) not available for the consumer market at this point in time! (Boy, that would be great as a $200 plug-in card for consumer PC's, wouldn't it? -- but I'm guessing it might take 10 years (or more!) for that to happen!)

I'm guessing in 20+ years we'll have unlimited bandwidth, infinitely low latency optical fiber interconnects everywhere... we can only dream, right? <g>




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: