I think you're trying to re-contextualize the old Standards joke, but I actually think you're right -- if a front end model could dispatch as appropriate to the best backend model for a given prompt, and turn everything into a high level sort of mixture of models, I think that would be great, and a great simplifying step. Then they can specialize and optimize all they want, CPU goes down, responses get better and we only see one interface.
I don't believe so. I thought agents were go-do-that-complicated-interactive-thing autonomously on my behalf. But, more similar to tool use, except, with mixture of experts, each expert assumes the continuation of "participant identity" in the conversation, in that they're fed the whole context.