I can't universally agree with the headline statement. The article focuses on the pros of SRAM, which are real -- peak bandwidth (e.g. 5 TB/s out of the H100’s L2) and lower energy per bit transferred (the rule of thumb I remember is ~10x lower than HBM).
But the companies who already bet big on SRAM in AI, Cerebras in particular and Graphcore to lesser extent, aren’t obviously running away with the AI performance crown. Seems like LLMs need more memory capacity than anyone expected, to the point where even HBM stacks somewhat limit the scale of current models. Maybe the next version of Cerebras WSE can get closer to 100GB of on-chip memory, and serve some useful LLMs very efficiently - excited to see what they can do with more modern processes!
I think innovation in SRAM packaging, like AMD’s stacking in “3D V-Cache”, is also promising and might play a larger role in AI accelerators going forward. But it’s important to note that while both are SRAM, performance of AMD’s stacked L3 is not yet comparable to a GPU’s centralized L2 - it’s more like HBM today in latency and bandwidth.
But the companies who already bet big on SRAM in AI, Cerebras in particular and Graphcore to lesser extent, aren’t obviously running away with the AI performance crown. Seems like LLMs need more memory capacity than anyone expected, to the point where even HBM stacks somewhat limit the scale of current models. Maybe the next version of Cerebras WSE can get closer to 100GB of on-chip memory, and serve some useful LLMs very efficiently - excited to see what they can do with more modern processes!
I think innovation in SRAM packaging, like AMD’s stacking in “3D V-Cache”, is also promising and might play a larger role in AI accelerators going forward. But it’s important to note that while both are SRAM, performance of AMD’s stacked L3 is not yet comparable to a GPU’s centralized L2 - it’s more like HBM today in latency and bandwidth.