> But how tightly can you really connect 27000 GPUs?
Not all that well currently, NVidia and others are working on GPU specific interconnects[0] but they don't have anywhere near the scale of traditional interconnects which have supported hundreds of thousands of nodes by the late 90s. On of the big challenges in modern super computer programming is in fact keeping the GPUs hot, which can often mean offloading work that needs high memory usage to CPUs.
Unfortunately my knowledge here is a little dated, I interned at Los Alamos National Lab from 2008 - 2012 when they were doing a lot of rearchitecting of old codes for RoadRunner, the first peta-scale computer. It used Cell chips in accelerator cards and predicated a lot of the challenges in GPU programming, but did not fully elucidate them. For instance we didn't have CUDA!
If I had to take my guess the first exa-scale computer is going to be the one that solves the GPU interconnect problem at scale.
Roadrunner was... special... in that regard, requiring even more effort and bizarreness than the typical HPC GPU setup. I remember that the pre-install plan for Roadrunner Linpack was a 50 page document. Also, it's worth noting that GPU HPC computing was already in full swing around the same time: CUDA was first released in June, 2007, which is the same time that Roadrunner released its first Top500 entry.
Exactly. First there was nvswitch, which dramatically increased the bandwidth over pcie. But that didn't scale to a large number of GPUs. Then there was nvswitch, which solved the scaling problem inside a node. I wouldn't be surprised if the next leap is something like nvlink cables between nodes that don't need traditional routing capabilities.
Not all that well currently, NVidia and others are working on GPU specific interconnects[0] but they don't have anywhere near the scale of traditional interconnects which have supported hundreds of thousands of nodes by the late 90s. On of the big challenges in modern super computer programming is in fact keeping the GPUs hot, which can often mean offloading work that needs high memory usage to CPUs.
Unfortunately my knowledge here is a little dated, I interned at Los Alamos National Lab from 2008 - 2012 when they were doing a lot of rearchitecting of old codes for RoadRunner, the first peta-scale computer. It used Cell chips in accelerator cards and predicated a lot of the challenges in GPU programming, but did not fully elucidate them. For instance we didn't have CUDA!
If I had to take my guess the first exa-scale computer is going to be the one that solves the GPU interconnect problem at scale.
0: https://www.nvidia.com/en-us/data-center/nvlink/