Writing and porting kernels between different GPU paradigms is relatively trivia...

Writing and porting kernels between different GPU paradigms is relatively trivial, that's not the issue (although I find the code much clunkier in everything other than CUDA). The problem is that the compiler toolchains and GPU accelerated libraries for FFT, BLAS, DNN, etc. which come bundled with CUDA are pretty terrible or non-existent for everything else, and the competitors are so far away from having a good answer to this. Intel have perhaps come closest with OneAPI but that can't target anything other than NVidia cards anyway, so it's a moot point.