Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
boroboro4
8 months ago
|
parent
|
context
|
favorite
| on:
Llama 3.1 405B now runs at 969 tokens/s on Cerebra...
There are two big tricks: their chips are enormous, and they use sram as their memory, which is vastly faster than hbm ram being used by GPUs. In fact this is main reason it’s
so
fast. Groq has the speed because of the same reason.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: