> That's what the cache hierarchies are for That’s the core point though. If you...

menaerus · 2025-06-25T11:46:34 1750851994

Not really. Registers are irrelevant. They are not the bottleneck.

pests · 2025-06-25T19:02:52 1750878172

Computation happens in the registers. If you’re not moving data to registers you aren’t doing any compute.

menaerus · 2025-07-02T07:24:14 1751441054

Obviously yes but NVIDIA Ampere/Hopper architecture has 64k 32-bit registers per SM. A100 has 108 SMs and H100 has 132 SMs so go figure - registers aren't a bottleneck.