From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Ballas 61 days ago | parent | context | favorite | on: Claude 3.7 Sonnet and Claude Code

From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model did not follow the edit instructions - the format was wrong. I know the distill models are not at all the same as the full model, but that could provide a clue. Combine that information with the scores, then you have a reasonable hypothesis.

re-thc 61 days ago [–]

> I know the distill models are not at all the same as the full model

It's far worse than that. It's not the model (Deepseek) at all. It's Qwen enhanced with Deepseek. So it's Qwen still.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact