>Is it weird how the comments here are blaming AMD and not Nvidia?
It's not. Even as it is, I do not trust HIP or RocM to be a viable alternative to Cuda. George Hotz did plenty of work trying to port various ML architectures to AMD and was met with countless driver bugs. The problem isn't nvidia won't build an open platform - the problem is AMD won't invest in a competitive platform. 99% of ML engineers do not write CUDA. For the vast majority of workloads, there are probably 20 engineers at Meta who write the Cuda backend for Pytorch that every other engineer uses. Meta could hire another 20 engineers to support whatever AMD has (they did, and it's not as robust as CUDA).
Even if CUDA was open - do you expect nvidia to also write drivers for AMD? I don't believe 3rd parties will get anywhere writing "compatibility layers" because AMD's own GPU aren't optimized or tested for CUDA-like workloads.
It's not. Even as it is, I do not trust HIP or RocM to be a viable alternative to Cuda. George Hotz did plenty of work trying to port various ML architectures to AMD and was met with countless driver bugs. The problem isn't nvidia won't build an open platform - the problem is AMD won't invest in a competitive platform. 99% of ML engineers do not write CUDA. For the vast majority of workloads, there are probably 20 engineers at Meta who write the Cuda backend for Pytorch that every other engineer uses. Meta could hire another 20 engineers to support whatever AMD has (they did, and it's not as robust as CUDA).
Even if CUDA was open - do you expect nvidia to also write drivers for AMD? I don't believe 3rd parties will get anywhere writing "compatibility layers" because AMD's own GPU aren't optimized or tested for CUDA-like workloads.