Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 (152334h.github.io)
		3 points by 152334H on Aug 8, 2023 \| hide \| past \| favorite \| 1 comment

turtleyacht on Aug 8, 2023 [–]

In AI/ML, Mixture of Experts (MoE).

"GPT-4 uses a simple top-2 Token Choice router for MLP MoE layers. It does not use MoE for attention."

GPT won't fix, since "tokens being dropped are generally good for the performance of MoE models."

https://152334h.github.io/blog/knowing-enough-about-moe/#con...

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact