> The progress in ≈7B models has been nothing short of astonishing.
I'd even still rank Mistral 7B above Mixtral personally, because the inference support for the latter is such a buggy mess that I have yet to get it working consistently and none of what I've seen people claim it can do has ever materialized for me on my local setup. MoE is a real fiddly trainwreck of an architecture. Plus 7B models can run on 8GB LPDDR4X ARM devices at about 2.5 tok/s which might be usable for some integrated applications.
It is rather awesome how far small models have come, though I still remember trying out Vicuna on WASM back in January or February and being impressed enough to be completely pulled into this whole LLM thing. The current 7B are about as good as the 30B were at the time, if not slightly better.
I'd even still rank Mistral 7B above Mixtral personally, because the inference support for the latter is such a buggy mess that I have yet to get it working consistently and none of what I've seen people claim it can do has ever materialized for me on my local setup. MoE is a real fiddly trainwreck of an architecture. Plus 7B models can run on 8GB LPDDR4X ARM devices at about 2.5 tok/s which might be usable for some integrated applications.
It is rather awesome how far small models have come, though I still remember trying out Vicuna on WASM back in January or February and being impressed enough to be completely pulled into this whole LLM thing. The current 7B are about as good as the 30B were at the time, if not slightly better.