Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
visarga
9 months ago
|
parent
|
context
|
favorite
| on:
Llama 3.1 405B now runs at 969 tokens/s on Cerebra...
You still have to pay for the memory. The Cerebras chip is fast because they use 700x more SRAM than, say, A100 GPUs. Loading the whole model in SRAM every time you compute one token is the expensive bit.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: