Hacker News new | past | comments | ask | show | jobs | submit login
Dynamic 4bit Quantization (unsloth.ai)
3 points by danielhanchen 52 days ago | hide | past | favorite | 5 comments



Hey HN family! Sometimes quantizing all parameters in models to 4bit will break them. I uploaded some mixed 4bit 90% + 16bit 10% quants for vision models (Llama, Qwen, Pixtral) and QwQ which retain accuracy and still are small to https://huggingface.co/unsloth


Nice. Unsloth requires a platform supported by Triton, correct?


Yep! Linux works - Windows works if you manually compile it, or use an unofficial Python wheel. I'm actually unsure if Mac devices support Triton.


I think basically only NVIDIA is reliably supported right now, it would be nice to have more hardware support to allow splitting models (like HF Accelerate or llama.cpp support).


Yep for now NVIDIA - AMD might work but I'll have to edit the dependencies - more hardware support is coming! I'm trying to add Apple and CPU support!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: