How does such a distillation work in theory? They don’t have weights from OpenAI... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		blackeyeblitzar 6 months ago \| parent \| context \| favorite \| on: The Illustrated DeepSeek-R1 How does such a distillation work in theory? They don’t have weights from OpenAI’s models, and can only call their APIs, right? So how can they actually build off of it?

moralestapia 6 months ago [–]

Like RLHF but the HF part is GPT4 instead.

KarraAI 6 months ago | [–]

How do you ensure the student model learns robust generalizations rather than just surface-level mimicry?

moralestapia 6 months ago | | [–]

No idea as I don't work on that, but my guess would be that the higher the 'n' the more model A approaches model B.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact