You still have to pay for the memory. The Cerebras chip is fast because they use...

		visarga 9 months ago \| parent \| context \| favorite \| on: Llama 3.1 405B now runs at 969 tokens/s on Cerebra... You still have to pay for the memory. The Cerebras chip is fast because they use 700x more SRAM than, say, A100 GPUs. Loading the whole model in SRAM every time you compute one token is the expensive bit.