AMD has hipify for this, which converts cuda code to hip. https://github.com/ROC...

3abiton · on July 16, 2024

There is more glaring issue, ROCm doesn't even work well on most AMD devices nowadays, and hip performance wise deterioriates on the same hardware compared to ROCm.

boroboro4 · on July 16, 2024

It supports all of current datacenter GPUs.

If you want to write very efficient CUDA kernel for modern datacenter NVIDIA GPU (read H100), you need to write it with having hardware in mind (and preferably in hands, H100 and RTX 4090 behave very differently in practice). So I don't think the difference between AMD and NVIDIA is as big as everyone perceives.