I’m most excited about Qwen-30B-A3B. Seems like a good choice for offline/local-only coding assistants.
Until now I found that open weight models were either not as good as their proprietary counterparts or too slow to run locally. This looks like a good balance.
It would be interesting to try, but for the Aider benchmark, the dense 32B model scores 50.2 and the 30B-A3B doesn't publish the Aider benchmark, so it may be poor.
The aider score mentioned in GP was published by Alibaba themselves, and is not yet on aider's leaderboard. The aider team will probably do their own tests and maybe come up with a different score.
The MoE version with 3b active parameters will run significantly faster (tokens/second) on the same hardware, by about an order of magnitude (i.e. ~4t/s vs ~40t/s)
Until now I found that open weight models were either not as good as their proprietary counterparts or too slow to run locally. This looks like a good balance.