> we have successfully used both RoCE and InfiniBand clusters for large, GenAI w...

loeg · on March 12, 2024

Yeah, and RoCE isn't single vendor. I'm not sure IB scales to the relevant cluster sizes, either.

anonymousDan · on March 12, 2024

Is NVLink just not scalable enough here?

loeg · on March 12, 2024

I don't know. I haven't actually worked with IB in this specific space (or since before Nvidia acquired MLNX). My experience with RoCE/IB was for storage cluster backend in the late 2010s.