There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic
Thanks Daniel. I know you upload them, but I was hoping for some solid numbers on your dynamic q8 vs a naive quant. There doesn't seem to be anything on either of those links to show improvement at those quant levels.
My gut feeling is that there's not enough benefit to outweigh the risk of putting a middleman in the chain of custody from the original model to my nvme.
However, I can't know for sure without more testing than I have the time or inclination for, which is why I was hoping there had been some analysis you could point me to.
Oh the blog at https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs does talk about 1, 2, 3, 4, 5, 6 and 8bit dynamic GGUFs as well!
There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic