Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh I also upload Q8_K_XL for eg, which will upcast important layers to BF16 / F16 as well!

Oh the blog at https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs does talk about 1, 2, 3, 4, 5, 6 and 8bit dynamic GGUFs as well!

There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic





Thanks Daniel. I know you upload them, but I was hoping for some solid numbers on your dynamic q8 vs a naive quant. There doesn't seem to be anything on either of those links to show improvement at those quant levels.

My gut feeling is that there's not enough benefit to outweigh the risk of putting a middleman in the chain of custody from the original model to my nvme.

However, I can't know for sure without more testing than I have the time or inclination for, which is why I was hoping there had been some analysis you could point me to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: