AMD has better seemingly better hardware - but not the production capacity to compete with Nvidia yet. Will be interesting to see margins compress when real competition catches up.
Everybody thinks it’s CUDA that makes Nvidia the dominant player. It’s not - almost 40% of their revenue this year comes from mega corporations that use their own custom stack to interact with GPUs. It’s only a matter of time before competition catches up and gives us cheaper GPUs.
are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack" (nvcc/ptx/sass/runtime/driver/etc).
some people emit llvm ir (maaaaybe ptx) directly instead of using the C/C++ frontend to CUDA. that's absolutely the only optional part of the stack and also basically the most trivial (i.e., it's not the frontend that's hard but the target codegen).
LLVM IR to machine code is not the part that AMD has traditionally struggled with. What you call "trivial" is. If everyone started emitting IR and didn't rely on NVidia-owned libs then the space would become unrecognizable. The codegen is something AMD has always been decent at, hence them beating NVidia in compute benchmarks for most of the past 20 years.
> LLVM IR to machine code is not the part that AMD has traditionally struggled with.
alright fine it's the codegen and the runtime and the driver and the library ecosystem...
> If everyone started emitting IR and didn't rely on NVidia-owned libs then the space would become unrecognizable.
I have no clue what this means - which libs are you talking about here? the libs that contain the implementations of their runtime? or the libs that contain the user space components of their driver? or the libs that contain their driver and firmware code? And exactly which of these will "everyone emitting IR" save us from?
I am talking about user and user-level libraries, so from PyTorch to cuBLAS. The rest is currently serviceable and at time was even slightly better than NVidia. If people start shipping code that targets, say, LLVM IR (that then gets converted to PTX or whatever), like one would do using SYCL, we only have to rely the bare minimum.
Yes it is - but Nvidia has larger contracts _right now_. Nvidia has been investing more money in producing more GPUs for longer, so it’s only natural that they have an advantage now.
But now that there’s a larger incentive to produce GPUs, their moat will eventually fall.
TSMC runs at 100% capacity for top tier processes - their bottleneck is more foundries. These take time to build. So the question becomes - how long can Nvidia remain dominant? It could be quarters or it could be years before any real competitor convinces large customers to switch over.
Microsoft and Google are producing their own AI hardware too - nobody wants to depend solely on Nvidia, but they’re currently forced to if they want to keep up.
Everybody thinks it’s CUDA that makes Nvidia the dominant player. It’s not - almost 40% of their revenue this year comes from mega corporations that use their own custom stack to interact with GPUs. It’s only a matter of time before competition catches up and gives us cheaper GPUs.