There are two big tricks: their chips are enormous, and they use sram as their m...

		boroboro4 8 months ago \| parent \| context \| favorite \| on: Llama 3.1 405B now runs at 969 tokens/s on Cerebra... There are two big tricks: their chips are enormous, and they use sram as their memory, which is vastly faster than hbm ram being used by GPUs. In fact this is main reason it’s so fast. Groq has the speed because of the same reason.