This is just a VBR mode for neural networks. Not quite useful when inference is ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		anticensor 1 day ago \| parent \| context \| favorite \| on: Lossless LLM compression for efficient GPU inferen... This is just a VBR mode for neural networks. Not quite useful when inference is already quite slow.

vessenes 1 day ago [–]

Even presuming this is an accurate summary, the conclusion is not accurate - most local LLM inference users are constantly trading off quality for speed, in that speed drops dramatically once RAM is full. So, if you think of speed at desired quality, this could be very useful.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact