Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At what parameter count? Its been established that quantization has less of an effect on larger models. By the time you are at 70B quantization to 4 bits basically is negligible


Source? I’ve seen this anecdotally and heard it, but is there a paper you’re referencing?


I work mostly with mixtral and mistral 7b these days, but I did work with some 70b models before mistral came out, and I was not impressed with the 4 bit Llama-2 70b.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: