Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Animats
1 day ago
|
parent
|
context
|
favorite
| on:
Lossless LLM compression for efficient GPU inferen...
Once this weight format war settles down, hardware can be built to support it. Presumably you want matrix multiply hardware optimized for whatever weight format turns out to be reasonably optimal.
eoerl
1 day ago
[–]
Optimization is post hoc here : you have to train first to be able to huffman en ode, so it's not a pure format question
reply
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: