You're only correct about Qwen's MoE. I presume that Chinese model builders feel...

samus on April 5, 2024 | parent | context | favorite | on: JetMoE: Reaching LLaMA2 performance with 0.1M doll...

You're only correct about Qwen's MoE. I presume that Chinese model builders feel more pressure to be efficient about using their GPU time because of sanctions.