Hacker News new | past | comments | ask | show | jobs | submit login

Oh the repetition issue is only on the non dynamic quants :) If you do dynamic quantization and use the 1.58bit dynamic quantized model the repetition issue fully disappears!

Min_p = 0.05 was a way I found to counteract the 1.58bit model generating singular incorrect tokens which happen around 1 token per 8000!






min_p is great, do you apply a small amount of temperate as well?

Btw, min_p (the paper about the sampler) got accepted to ICLR! As 4th author it warms my heart to so it used so much in the wild.

Oh hi!! Congratulations on ICLR!!! min_p = 0.1 and temp = 1.5 is my default goto settings!!

The recommended temperature from DeepSeek is 0.6 so I leave it at that!

I think most of the model creators share their model usage examples so high at 0.6-0.7 simply because it's what a lot of the client apps use. IMO this is WAY too high unless you're doing creative writing.

Generally I set temp to 0-0.4 at absolute most.

min_p actually needs a little temperature to work effectively so with min_p I almost always use 0.2


Ye lower temp is also good :) Tbh its all trial and error - I found temp=1.5, min_p=0.1 to be very useful for pass@k type workloads - ie calling the LLM multiple times and aggregating.

temp=0 is also good for singular outputs. For classification tasks, it's better to actually inspect the logits.

But my goto setting is always setting min_p at least 0.01 or 0.05! It vastly suppresses incorrect rare random tokens from being created, and it helps massively!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: