Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cuda is a programming language. You implement it like any other. The docs are a bit sparse but not awful. Targeting amdgpu is probably about as difficult as targeting x64, mostly changes the compiler runtime.

The online ptx implementation is notable for being even more annoying to deal with than the cuda, but it's just bytes in / different bytes out. No magic.



[I work on SCALE]

CUDA has a couple of extra problems beyond just any other programming language:

- CUDA is more than a language: it's a giant library (for both CPU and GPU) for interacting with the GPU, and for writing the GPU code. This needed reimplementing. At least for the device-side stuff we can implement it in CUDA, so when we add support for other GPU vendors the code can (mostly) just be recompiled and work there :D. - CUDA (the language) is not actually specified. It is, informally, "whatever nvcc does". This differs significantly from what Clang's CUDA support does (which is ultimately what the HIP compiler is derived from).

PTX is indeed vastly annoying.


The openmp device runtime library was originally written in cuda. I ported that to hip for amdgpu, discovered the upstream hip compiler wasn't quite as solid as advertised, then ported it to openmp with some compiler intrinsics. The languages are all essentially C++ syntax with some spurious noise obfuscating llvm IR. The libc effort has gone with freestanding c++ based on that experience and and we've now mostly fixed the ways that goes wrong.

You might also find raw c++ for device libraries saner to deal with than cuda. In particular you don't need to jury rig the thing to not spuriously embed the GPU code in x64 elf objects and/or pull the binaries apart. Though if you're feeding the same device libraries to nvcc with #ifdef around the divergence your hands are tied.


> You might also find raw c++ for device libraries saner to deal with than cuda.

Actually, we just compile all the device libraries to LLVM bitcode and be done with it. Then we can write them using all the clang-dialect, not-nvcc-emulating, C++23 we feel like, and it'll still work when someone imports them into their c++98 CUDA project from hell. :D




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: