Hacker News new | past | comments | ask | show | jobs | submit login

From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model did not follow the edit instructions - the format was wrong. I know the distill models are not at all the same as the full model, but that could provide a clue. Combine that information with the scores, then you have a reasonable hypothesis.



> I know the distill models are not at all the same as the full model

It's far worse than that. It's not the model (Deepseek) at all. It's Qwen enhanced with Deepseek. So it's Qwen still.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: