Hacker News new | past | comments | ask | show | jobs | submit login

You can choose the quantization by appending the right tag to the model name, but they don't support other more advanced useful features (e.g. you need a special flag to enable flash attention and you cannot use KV cache quantization for large contexts).



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: