AMD has better seemingly better hardware - but not the production capacity to co...

almostgotcaught · 2024-06-13T12:27:55 1718281675

> their own custom stack to interact with GPUs

lol completely made up.

are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack" (nvcc/ptx/sass/runtime/driver/etc).

source: i work at one of these "mega corps". hell if you don't believe me go look at how many CUDA kernels pytorch has https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/n....

> Everybody thinks it’s CUDA that makes Nvidia the dominant player.

it 100% does

pastaguy1 · 2024-06-13T11:51:52 1718279512

Can you explain the cuda-less stack a little more or provide a source?

almostgotcaught · 2024-06-13T12:32:23 1718281943

some people emit llvm ir (maaaaybe ptx) directly instead of using the C/C++ frontend to CUDA. that's absolutely the only optional part of the stack and also basically the most trivial (i.e., it's not the frontend that's hard but the target codegen).

sudosysgen · 2024-06-13T14:00:09 1718287209

LLVM IR to machine code is not the part that AMD has traditionally struggled with. What you call "trivial" is. If everyone started emitting IR and didn't rely on NVidia-owned libs then the space would become unrecognizable. The codegen is something AMD has always been decent at, hence them beating NVidia in compute benchmarks for most of the past 20 years.

almostgotcaught · 2024-06-13T14:26:56 1718288816

> LLVM IR to machine code is not the part that AMD has traditionally struggled with.

alright fine it's the codegen and the runtime and the driver and the library ecosystem...

> If everyone started emitting IR and didn't rely on NVidia-owned libs then the space would become unrecognizable.

I have no clue what this means - which libs are you talking about here? the libs that contain the implementations of their runtime? or the libs that contain the user space components of their driver? or the libs that contain their driver and firmware code? And exactly which of these will "everyone emitting IR" save us from?

sudosysgen · 2024-06-13T15:51:38 1718293898

I am talking about user and user-level libraries, so from PyTorch to cuBLAS. The rest is currently serviceable and at time was even slightly better than NVidia. If people start shipping code that targets, say, LLVM IR (that then gets converted to PTX or whatever), like one would do using SYCL, we only have to rely the bare minimum.

imtringued · 2024-06-13T17:07:17 1718298437

AMD is struggling with unsafe C and C++ code breaking their drivers.

Refusing23 · 2024-06-13T11:28:54 1718278134

> but not the production capacity to compete with Nvidia yet.

thats just a question of negotiating with tsmc or their few competitors

(also didn't tsmc start production of some factories in the US and/or EU?)

I mean, nvidia use tsmc, so does amd.

huntertwo · 2024-06-13T11:37:58 1718278678

Yes it is - but Nvidia has larger contracts _right now_. Nvidia has been investing more money in producing more GPUs for longer, so it’s only natural that they have an advantage now.

But now that there’s a larger incentive to produce GPUs, their moat will eventually fall.

TSMC runs at 100% capacity for top tier processes - their bottleneck is more foundries. These take time to build. So the question becomes - how long can Nvidia remain dominant? It could be quarters or it could be years before any real competitor convinces large customers to switch over.

Microsoft and Google are producing their own AI hardware too - nobody wants to depend solely on Nvidia, but they’re currently forced to if they want to keep up.

fmajid · 2024-06-15T13:27:34 1718458054

Isn't their moat primarily software (CUDA) rather than supply-chain strength?