> But how tightly can you really connect 27000 GPUs? Not all that well currently...

wumpus · on June 29, 2018

Roadrunner was... special... in that regard, requiring even more effort and bizarreness than the typical HPC GPU setup. I remember that the pre-install plan for Roadrunner Linpack was a 50 page document. Also, it's worth noting that GPU HPC computing was already in full swing around the same time: CUDA was first released in June, 2007, which is the same time that Roadrunner released its first Top500 entry.

shaklee3 · on June 29, 2018

Exactly. First there was nvswitch, which dramatically increased the bandwidth over pcie. But that didn't scale to a large number of GPUs. Then there was nvswitch, which solved the scaling problem inside a node. I wouldn't be surprised if the next leap is something like nvlink cables between nodes that don't need traditional routing capabilities.

shaklee3 · on June 29, 2018

First there was nvlink, rather.