Sure, it's done per token, but the question is: how much do the knowledge domain... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		viraptor 67 days ago \| parent \| context \| favorite \| on: Kimi K2 is a state-of-the-art mixture-of-experts (... Sure, it's done per token, but the question is: how much do the knowledge domains match up with experts. I could not find hard data on this.

boroboro4 67 days ago [–]

Check out DeepSeek v3 model paper. They changed the way they train experts (went from aux loss to different kind expert separation training). It did improve experts domain specialization, they have neat graphics on it in the paper.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact